Hidden Features To 10x Your Hermes Agent Setup

AAI LABS
Computing/SoftwareManagementInternet Technology

Transcript

00:00:00ever since we started using hermes we've set up a lot of our workflows on it as we showed you in
00:00:04the previous videos it's been monitoring our apps coordinating the team on slack and more but the
00:00:09more we used it the more we ran into the same problems and it started to feel like our setup
00:00:13wasn't enough like we always do we started looking for ways to solve the issues but that's when we
00:00:18realized we didn't need to add anything else because everything we needed was already in hermes
00:00:23itself we just weren't using it to its full potential now if you're new to the channel then
00:00:27welcome we're a software company and this is ai labs where we show you how to optimize a business
00:00:32with ai using proven methods from our own team and in this video we're going through all the settings
00:00:37we change to improve our workflow so the first category is all about context and output limits
00:00:43hermes uses the dot hermes folder which holds all the configs and info that run the agent and all of
00:00:49that lives in one single file called config.yaml it's a really long file and it contains every config
00:00:55tied to the agent setup so if you're managing multiple profiles like we are each one gets its
00:01:00own separate folder and every profile has its own config.yaml file so the first one we'll change is
00:01:06max bytes by default this is set to 50 000 which means it pulls 50 000 characters from any tool output
00:01:13into the context window at once and the rest get cut off that became a problem when we were using it
00:01:17to monitor test runs because it wouldn't properly see the issues when they were long so we needed more of
00:01:23that output in the context window for that you can either set max bytes directly in the config.yaml file
00:01:29or change it to the number you need using the hermes config command once that's done it pulls that many
00:01:34characters into the context window from all tool outputs but you'll need to make sure the right
00:01:39profile is selected because the changes you make with the hermes config command show up in your active
00:01:44profiles config file another problem shows up when the agent reads a file with a lot of lines this happened to
00:01:50us when we connected hermes to our company's knowledge base where we have these large policy documents that
00:01:55are easily more than 2000 lines so when it pulled them in by breaking them into chunks it kept missing
00:02:01important details so we set it to 5000 and let the agent read more of the file at once there's another
00:02:07limit that becomes a problem when you have a lot of large markdown files if your document has long
00:02:12paragraphs stored as a single long line and that line is more than 2000 characters it won't be fully read so if you
00:02:18want to increase that you can change it with the hermes config command and set the character count you
00:02:23need that way the agent can read more than 2000 characters in a single line the first three settings
00:02:28mostly matter if you work with large files but this next one's important for everyone and that's the
00:02:33compression threshold by default the compression threshold is set to 50 which means once 50 of
00:02:39the context window is filled it compresses everything in there but a lot of other agents like codex and
00:02:45clawed code have this set to around 75 we ran into this ourselves while running hermes since we'd set
00:02:51it up with a smaller model on 200 000 context it compressed too early which isn't ideal when you
00:02:56actually want to get things done now models like opus or the gemini ones with a million token window
00:03:02would be fine here because compression only happens at 500 000 tokens for them but for models with 200 000
00:03:08context it happens at 100 000 tokens which causes issues on a long run so we set the compression threshold
00:03:15to 0.75 that way we can at least use 75 of the context window before it hits compression another
00:03:22setting is called target ratio which is set to 20 by default when hermes hits compress it doesn't
00:03:28compress the entire chat instead it leaves 20 of the conversation uncompressed and starts the new
00:03:34conversation with that uncompressed part along with the summary so that uncompressed 20 becomes your tail
00:03:40once the new compressed conversation starts now how much is left uncompressed depends on how big
00:03:45your context window is for a 1 million token context window 100 000 tokens get added and for a 200 000
00:03:52token context window only 20 000 tokens get added and this tail gives the agent more context on the
00:03:58previous conversation so it can pick up easily so 20 works for us on a 200 000 context window but if
00:04:04you're on a larger model you can use the config command to set it higher the ideal range is between
00:04:0910 to 80 percent the higher the number the more tokens stay in your context window but you'll also
00:04:15have less free room to work with as we talked about in the previous video the memory.md and user.md
00:04:21files that hermes keeps have a hard limit on how many characters they can hold after that hermes
00:04:26starts dropping information the agent thinks it no longer needs you can change these limits too either
00:04:32directly in your config.yaml file or through the hermes desktop app from the settings pane from there
00:04:37you can also change most of the settings we just talked about and if you're enjoying the video so far
00:04:42subscribe to the channel and hit the hype button this small gesture of support goes a long way for us
00:04:47the second category is sub-agents on hermes you're limited to spawning three sub-agents at once and when we
00:04:53were working on our projects we hit this limit and things ended up taking longer than they needed to
00:04:58in the config this limit comes from the max concurrent children value which is set to three by default
00:05:03since we were running into issues we use the config command and change this value to five from that
00:05:08point on whenever it spins up sub-agents it can run up to five of them together but this is token
00:05:13heavy so if you're working with a lot of sub-agents cost is something you need to watch out for now in
00:05:18claude code each sub-agent can create its own sub-agents and that's helpful when you're working
00:05:23with a large folder where one agent can branch out into more agents to explore nested repos but hermes
00:05:28blocks this with the max spawn depth flag which is set to one by default and that stops any sub-agent
00:05:34from creating more so you can push the max spawn depth above one after that your sub-agents can create
00:05:39their own sub-agents too there's another sub-agent feature called auto-approve which is set to false by
00:05:44default this means the sub-agents you spawn only inherit the parents permissions and they might
00:05:50still get blocked by permission prompts so if you want to change this you can set it to true directly
00:05:55here once you've done that your sub-agents can run in auto-approve mode and won't get blocked by any
00:06:00permission prompts sub-agents handle simple tasks like web searches that don't need the heavy lifting of
00:06:06your main model but running them on that powerful model burns a lot of cost for work like this so you
00:06:11can change the model used for any sub-agent and switch it to a smaller one which saves you tokens
00:06:16and if that smaller model is from a different provider you can add it using the hermes auth command
00:06:21which lets you pull in models from whichever provider you want but before we move towards
00:06:25the settings that save us costs let's have a word by our sponsor helix every week there's a new ai
00:06:31tool that helps you build apps websites and products faster than ever but nobody talks about what happens
00:06:37before you start building most people jump straight into coding with a half-baked idea and end up
00:06:42rebuilding the same thing three times helix is an ai guided product planning platform that takes a rough
00:06:48idea and turns it into a structured plan you can actually hand off to a developer or a stakeholder
00:06:53you describe your idea in one sentence and five ai specialist agents go to work covering validation
00:06:59market research product development business modeling and growth strategy it pulls live market data in real
00:07:05time connects to over 20 tools you already use like notion jira and air table and the canvas adapts to
00:07:11your actual product needs instead of forcing you into a generic template when you're done you export an
00:07:16investor ready pdf blueprint that's actually built on real research not guesswork click the link in
00:07:21the description and try helix for free the third category is cost these are basically the settings that save you
00:07:27tokens when you first set up hermes you give it the models for different purposes but you can set up
00:07:32auxiliary models as well auxiliary models are basically the cheaper faster ones that hermes uses for
00:07:38background subtasks that way the expensive main model you've set up isn't wasted on small tasks that
00:07:43aren't that complicated by default when you leave the auxiliary models empty hermes falls back to the
00:07:48lowest cost model in your config since we were using open router it was set to gemini flash so these
00:07:54cheaper models could handle tasks behind the scenes so if you want to save costs you can set up cheaper
00:07:59models manually they can save you a lot of money on tasks like web searches or compression if your main
00:08:05model is something like opus you probably don't want to waste it on trivial tasks on saving costs another
00:08:11thing you can configure is the effort level of the model you're using effort is basically how much
00:08:15thinking the model puts into a task if the effort is higher even though the output will be better but the
00:08:20tokens consumed will also be higher you can set it to low or minimum so the agent doesn't waste tokens
00:08:26you can also turn off thinking completely if you don't want to use effort levels the fourth category is
00:08:32workflow and it covers a bunch of other features that make hermes so much better to use the first one is
00:08:37quick commands if you've been using clawed code you might know slash commands where you add custom reusable
00:08:43instructions they do a similar job but hermes handles them differently because it doesn't use prompt
00:08:48instructions the way clawed code and other agents do quick commands come in two types the first one is
00:08:54exec which runs a terminal command and drops its output into the context window this is helpful for
00:08:59creating scripts that run a whole series of commands from just a single one for example for git operations
00:09:04you can set up a custom exec command and run it whenever you want the agent to use those commands the
00:09:09other type is alias this is less of a custom command and more of a way to rename existing ones for
00:09:15example if you want a quicker way to run compress you can set an alias to just a single letter and run it
00:09:20fast there's no direct way to set this up so you actually have to do it in config.yaml or you can just
00:09:25ask claude code or hermes to do it for you and it'll make the changes itself aside from that hermes has
00:09:31a checkpointing mechanism too a checkpoint is basically a saved state of your files at a certain point in
00:09:36time you can roll back to it if an experiment breaks something it's turned off by default so you'll have to
00:09:41set it to true once checkpointing is on you can use the rollback command to go back to a previous
00:09:46checkpoint another thing you can change is background process notifications if you set this to all you'll
00:09:52get a notification for everything hermes is doing in the background you can change it if you don't want
00:09:56those there's also a flag called hermes ephemeral system prompt which lets you add content into the
00:10:01system prompt of the agent this is an environment variable and the instruction you add in as the value it
00:10:06becomes part of the system prompt so you can add whatever instructions you want this way but this
00:10:11prompt only applies to the session you open in that terminal and it doesn't stick around long term
00:10:16so it's mainly useful for one-time use cases you can also run hermes in yolo mode which is the same
00:10:22as the dangerously skip permissions mode in claude this stops the agent from sitting there waiting for
00:10:27you to approve every action you can turn it on with the yolo command or by launching hermes with the
00:10:32yolo flag in the terminal at one point we ran into an error and weren't sure if it was coming from hermes
00:10:37itself or from some config we'd set up that's when we came across the ignore user config mode it strips
00:10:43the agent of all the configs in your dot hermes folder and runs it in isolation so you can figure
00:10:48out what's actually causing the error and fix it you can also switch between the multiple personalities
00:10:53that come with it and have fun with the different voice styles already in the configs using the personality
00:10:59command since a lot of people have been asking about it we've put together a starter pack with all the
00:11:03guides and resources you'll need it's available inside our community ai labs pro so if you'd like
00:11:09to support the channel and get access to this resource pack be sure to check it out the link is in the
00:11:14description that brings us to the end of this video if you'd like to support the channel and help us keep
00:11:19making videos like this you can do so by using the super thanks button below as always thank you for
00:11:24watching and i'll see you in the next one

Key Takeaway

Optimizing Hermes configuration files—specifically adjusting context limits, compression thresholds, and sub-agent concurrency—significantly improves agent performance and efficiency for complex workflows.

Highlights

  • Increase 'max bytes' in config.yaml to pull more than the default 50,000 characters from tool outputs into the context window.

  • Setting the compression threshold to 0.75 instead of the default 0.50 allows the agent to utilize 75% of the context window before triggering compression.

  • Raise the 'max concurrent children' limit from 3 to 5 to run more sub-agents simultaneously, though this increases token consumption.

  • Enable 'auto-approve' for sub-agents to bypass permission prompts, allowing them to inherit parent permissions directly.

  • Configure auxiliary models in the config to offload trivial background tasks to cheaper, faster models like Gemini Flash.

  • Use 'checkpointing' and the 'rollback' command to revert file states to previous saved versions if an experiment fails.

Timeline

Context and Output Management

  • Default context limits often truncate important tool output and file data.
  • Increasing the compression threshold improves performance on models with 200,000 token windows.
  • Target ratio settings determine how much conversation context remains uncompressed after reaching the threshold.

Managing the config.yaml file is critical for handling large files and long-running tasks. Increasing 'max bytes' ensures the agent reads entire tool outputs rather than truncating them at 50,000 characters. Modifying the compression threshold to 0.75 prevents premature data loss on smaller-context models. Adjusting the target ratio helps balance the amount of retained context tail versus available working space.

Sub-Agent Configuration

  • The default limit of three concurrent sub-agents can create bottlenecks.
  • Max spawn depth determines whether sub-agents can create their own recursive sub-agents.
  • Auto-approve mode removes recurring permission prompts for sub-tasks.

Sub-agents allow for parallelized workflows, but default constraints limit their utility. Raising 'max concurrent children' enables up to five parallel tasks. Changing 'max spawn depth' from the default of 1 allows for deeper, nested task execution. Enabling 'auto-approve' prevents the agent from stalling on permission prompts for routine operations.

Cost Optimization and Workflow Features

  • Auxiliary models handle trivial background tasks to save costs on primary expensive models.
  • Effort levels and thinking toggles directly impact token consumption.
  • Quick commands, checkpointing, and ephemeral system prompts customize behavior for specific sessions.

Cost efficiency is managed by delegating background tasks to cheaper models, avoiding the use of expensive primary models like Opus for trivial queries. Model 'effort' levels provide control over reasoning depth versus token usage. Workflow tools like custom quick commands, persistent file checkpointing, and Yolo mode for non-interactive execution further tailor the agent to specific operational requirements.

Community Posts

View all posts