Transcript
00:00:00ever since we started using hermes we've set up a lot of our workflows on it as we showed you in
00:00:04the previous videos it's been monitoring our apps coordinating the team on slack and more but the
00:00:09more we used it the more we ran into the same problems and it started to feel like our setup
00:00:13wasn't enough like we always do we started looking for ways to solve the issues but that's when we
00:00:18realized we didn't need to add anything else because everything we needed was already in hermes
00:00:23itself we just weren't using it to its full potential now if you're new to the channel then
00:00:27welcome we're a software company and this is ai labs where we show you how to optimize a business
00:00:32with ai using proven methods from our own team and in this video we're going through all the settings
00:00:37we change to improve our workflow so the first category is all about context and output limits
00:00:43hermes uses the dot hermes folder which holds all the configs and info that run the agent and all of
00:00:49that lives in one single file called config.yaml it's a really long file and it contains every config
00:00:55tied to the agent setup so if you're managing multiple profiles like we are each one gets its
00:01:00own separate folder and every profile has its own config.yaml file so the first one we'll change is
00:01:06max bytes by default this is set to 50 000 which means it pulls 50 000 characters from any tool output
00:01:13into the context window at once and the rest get cut off that became a problem when we were using it
00:01:17to monitor test runs because it wouldn't properly see the issues when they were long so we needed more of
00:01:23that output in the context window for that you can either set max bytes directly in the config.yaml file
00:01:29or change it to the number you need using the hermes config command once that's done it pulls that many
00:01:34characters into the context window from all tool outputs but you'll need to make sure the right
00:01:39profile is selected because the changes you make with the hermes config command show up in your active
00:01:44profiles config file another problem shows up when the agent reads a file with a lot of lines this happened to
00:01:50us when we connected hermes to our company's knowledge base where we have these large policy documents that
00:01:55are easily more than 2000 lines so when it pulled them in by breaking them into chunks it kept missing
00:02:01important details so we set it to 5000 and let the agent read more of the file at once there's another
00:02:07limit that becomes a problem when you have a lot of large markdown files if your document has long
00:02:12paragraphs stored as a single long line and that line is more than 2000 characters it won't be fully read so if you
00:02:18want to increase that you can change it with the hermes config command and set the character count you
00:02:23need that way the agent can read more than 2000 characters in a single line the first three settings
00:02:28mostly matter if you work with large files but this next one's important for everyone and that's the
00:02:33compression threshold by default the compression threshold is set to 50 which means once 50 of
00:02:39the context window is filled it compresses everything in there but a lot of other agents like codex and
00:02:45clawed code have this set to around 75 we ran into this ourselves while running hermes since we'd set
00:02:51it up with a smaller model on 200 000 context it compressed too early which isn't ideal when you
00:02:56actually want to get things done now models like opus or the gemini ones with a million token window
00:03:02would be fine here because compression only happens at 500 000 tokens for them but for models with 200 000
00:03:08context it happens at 100 000 tokens which causes issues on a long run so we set the compression threshold
00:03:15to 0.75 that way we can at least use 75 of the context window before it hits compression another
00:03:22setting is called target ratio which is set to 20 by default when hermes hits compress it doesn't
00:03:28compress the entire chat instead it leaves 20 of the conversation uncompressed and starts the new
00:03:34conversation with that uncompressed part along with the summary so that uncompressed 20 becomes your tail
00:03:40once the new compressed conversation starts now how much is left uncompressed depends on how big
00:03:45your context window is for a 1 million token context window 100 000 tokens get added and for a 200 000
00:03:52token context window only 20 000 tokens get added and this tail gives the agent more context on the
00:03:58previous conversation so it can pick up easily so 20 works for us on a 200 000 context window but if
00:04:04you're on a larger model you can use the config command to set it higher the ideal range is between
00:04:0910 to 80 percent the higher the number the more tokens stay in your context window but you'll also
00:04:15have less free room to work with as we talked about in the previous video the memory.md and user.md
00:04:21files that hermes keeps have a hard limit on how many characters they can hold after that hermes
00:04:26starts dropping information the agent thinks it no longer needs you can change these limits too either
00:04:32directly in your config.yaml file or through the hermes desktop app from the settings pane from there
00:04:37you can also change most of the settings we just talked about and if you're enjoying the video so far
00:04:42subscribe to the channel and hit the hype button this small gesture of support goes a long way for us
00:04:47the second category is sub-agents on hermes you're limited to spawning three sub-agents at once and when we
00:04:53were working on our projects we hit this limit and things ended up taking longer than they needed to
00:04:58in the config this limit comes from the max concurrent children value which is set to three by default
00:05:03since we were running into issues we use the config command and change this value to five from that
00:05:08point on whenever it spins up sub-agents it can run up to five of them together but this is token
00:05:13heavy so if you're working with a lot of sub-agents cost is something you need to watch out for now in
00:05:18claude code each sub-agent can create its own sub-agents and that's helpful when you're working
00:05:23with a large folder where one agent can branch out into more agents to explore nested repos but hermes
00:05:28blocks this with the max spawn depth flag which is set to one by default and that stops any sub-agent
00:05:34from creating more so you can push the max spawn depth above one after that your sub-agents can create
00:05:39their own sub-agents too there's another sub-agent feature called auto-approve which is set to false by
00:05:44default this means the sub-agents you spawn only inherit the parents permissions and they might
00:05:50still get blocked by permission prompts so if you want to change this you can set it to true directly
00:05:55here once you've done that your sub-agents can run in auto-approve mode and won't get blocked by any
00:06:00permission prompts sub-agents handle simple tasks like web searches that don't need the heavy lifting of
00:06:06your main model but running them on that powerful model burns a lot of cost for work like this so you
00:06:11can change the model used for any sub-agent and switch it to a smaller one which saves you tokens
00:06:16and if that smaller model is from a different provider you can add it using the hermes auth command
00:06:21which lets you pull in models from whichever provider you want but before we move towards
00:06:25the settings that save us costs let's have a word by our sponsor helix every week there's a new ai
00:06:31tool that helps you build apps websites and products faster than ever but nobody talks about what happens
00:06:37before you start building most people jump straight into coding with a half-baked idea and end up
00:06:42rebuilding the same thing three times helix is an ai guided product planning platform that takes a rough
00:06:48idea and turns it into a structured plan you can actually hand off to a developer or a stakeholder
00:06:53you describe your idea in one sentence and five ai specialist agents go to work covering validation
00:06:59market research product development business modeling and growth strategy it pulls live market data in real
00:07:05time connects to over 20 tools you already use like notion jira and air table and the canvas adapts to
00:07:11your actual product needs instead of forcing you into a generic template when you're done you export an
00:07:16investor ready pdf blueprint that's actually built on real research not guesswork click the link in
00:07:21the description and try helix for free the third category is cost these are basically the settings that save you
00:07:27tokens when you first set up hermes you give it the models for different purposes but you can set up
00:07:32auxiliary models as well auxiliary models are basically the cheaper faster ones that hermes uses for
00:07:38background subtasks that way the expensive main model you've set up isn't wasted on small tasks that
00:07:43aren't that complicated by default when you leave the auxiliary models empty hermes falls back to the
00:07:48lowest cost model in your config since we were using open router it was set to gemini flash so these
00:07:54cheaper models could handle tasks behind the scenes so if you want to save costs you can set up cheaper
00:07:59models manually they can save you a lot of money on tasks like web searches or compression if your main
00:08:05model is something like opus you probably don't want to waste it on trivial tasks on saving costs another
00:08:11thing you can configure is the effort level of the model you're using effort is basically how much
00:08:15thinking the model puts into a task if the effort is higher even though the output will be better but the
00:08:20tokens consumed will also be higher you can set it to low or minimum so the agent doesn't waste tokens
00:08:26you can also turn off thinking completely if you don't want to use effort levels the fourth category is
00:08:32workflow and it covers a bunch of other features that make hermes so much better to use the first one is
00:08:37quick commands if you've been using clawed code you might know slash commands where you add custom reusable
00:08:43instructions they do a similar job but hermes handles them differently because it doesn't use prompt
00:08:48instructions the way clawed code and other agents do quick commands come in two types the first one is
00:08:54exec which runs a terminal command and drops its output into the context window this is helpful for
00:08:59creating scripts that run a whole series of commands from just a single one for example for git operations
00:09:04you can set up a custom exec command and run it whenever you want the agent to use those commands the
00:09:09other type is alias this is less of a custom command and more of a way to rename existing ones for
00:09:15example if you want a quicker way to run compress you can set an alias to just a single letter and run it
00:09:20fast there's no direct way to set this up so you actually have to do it in config.yaml or you can just
00:09:25ask claude code or hermes to do it for you and it'll make the changes itself aside from that hermes has
00:09:31a checkpointing mechanism too a checkpoint is basically a saved state of your files at a certain point in
00:09:36time you can roll back to it if an experiment breaks something it's turned off by default so you'll have to
00:09:41set it to true once checkpointing is on you can use the rollback command to go back to a previous
00:09:46checkpoint another thing you can change is background process notifications if you set this to all you'll
00:09:52get a notification for everything hermes is doing in the background you can change it if you don't want
00:09:56those there's also a flag called hermes ephemeral system prompt which lets you add content into the
00:10:01system prompt of the agent this is an environment variable and the instruction you add in as the value it
00:10:06becomes part of the system prompt so you can add whatever instructions you want this way but this
00:10:11prompt only applies to the session you open in that terminal and it doesn't stick around long term
00:10:16so it's mainly useful for one-time use cases you can also run hermes in yolo mode which is the same
00:10:22as the dangerously skip permissions mode in claude this stops the agent from sitting there waiting for
00:10:27you to approve every action you can turn it on with the yolo command or by launching hermes with the
00:10:32yolo flag in the terminal at one point we ran into an error and weren't sure if it was coming from hermes
00:10:37itself or from some config we'd set up that's when we came across the ignore user config mode it strips
00:10:43the agent of all the configs in your dot hermes folder and runs it in isolation so you can figure
00:10:48out what's actually causing the error and fix it you can also switch between the multiple personalities
00:10:53that come with it and have fun with the different voice styles already in the configs using the personality
00:10:59command since a lot of people have been asking about it we've put together a starter pack with all the
00:11:03guides and resources you'll need it's available inside our community ai labs pro so if you'd like
00:11:09to support the channel and get access to this resource pack be sure to check it out the link is in the
00:11:14description that brings us to the end of this video if you'd like to support the channel and help us keep
00:11:19making videos like this you can do so by using the super thanks button below as always thank you for
00:11:24watching and i'll see you in the next one