Hermes: The Self-Improving Agent That Gets Smarter Every Day

BBetter Stack
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00OK, Hermes is an open-source AI agent created by American company News Research that is
00:00:06self-improving. So basically, the more you use it, the better it gets. It reflects,
00:00:10learns and evolves on its own, it never forgets anything you've said and it even creates its own
00:00:16skills. But is all of that enough to replace something like OpenClaw, which supports many
00:00:22more channels, has better sandboxing and is much more mature? Hit subscribe and let's get into it.
00:00:30So the name Hermes, surprise, surprise comes from the Greek messenger God. And that's also
00:00:37where this symbol comes from. You'll see more of it later on in the video. But as it stands,
00:00:42I've already made a video about OpenClaw, which is great, but has a lot of features that I won't
00:00:47personally use. And NanoClaw, which has a much smaller feature set, but is built on top of the
00:00:52Claude agent SDK, which is now less usable for me because of the weird rules around using the
00:00:59Claude subscription with third-party tools. So now I'm on the lookout for a new AI assistant
00:01:04and let's see if Hermes, the self-improving AI agent, can fill that void. I'm going to use it
00:01:09to create promotional tweets for me based on past videos that I've created. And I'm going to give it
00:01:14some scripts and directions to get to that stage. Now, this is quite a small task, but the focus is
00:01:20more to see if Hermes can remember my writing style and all the feedback I'm going to give it to create
00:01:26a tweet that I like without me asking it over and over again. Let's go. So I've already gone ahead
00:01:30and installed Hermes using this command, which is very simple and went through everything from
00:01:35collecting a model. I chose OpenRouter with Gemma 4, but if my hardware could handle it,
00:01:40I would run it locally and connect it to Hermes, messaging platforms and tools for the CLI. If you
00:01:45used OpenClaw, this whole process will feel very familiar. I've also set it up on a VPS to be on the
00:01:51safe side, but if you wanted to, you could easily install it locally on your machine. So from here,
00:01:55I'm going to write the Hermes command, which will start a new chat showing the Hermes symbol with
00:01:59the available tools and skills over here. Note, when you run the Hermes command, it creates a new
00:02:04session and doesn't resume the previous one unless you specify, just like called code. So here I'm
00:02:08going to give it a prompt. I want you to help me write tweets based on the scripts from my videos.
00:02:12Let's go through the process of doing that. After a while, it comes back with a response,
00:02:16which I like the structure of. And so I'm going to give it a follow-up prompt. I have scripts inside
00:02:21the scripts folder, study them to understand my writing style and voice. I've also given it my
00:02:25target audience and the length I'd like my tweets to be. So now it's using some tools to search
00:02:30through my files and after a while it analyses my script to give me a breakdown of my style.
00:02:34So it says I'm pragmatic and sceptical, which is true. I'm developer centric and I'm transparent
00:02:40and relatable. It's also come up with a strategy for my target audience, which I like the look of.
00:02:45But I've changed my mind. Even though I did say I wanted the tweets to be around 210 characters,
00:02:50I actually want them to be a bit longer. So I'm going to give it a new prompt. And I have noticed
00:02:54it's been taking a while and using a lot of context. So what I can do is change the model mid
00:02:59session by running the model slash command and specifying the model I want. In this case, I want
00:03:04GLM five turbo. So now it switched to that model. I'm going to give it a new prompt to make the
00:03:08tweets longer. And it comes back with response much faster, but has also added a lot of information to
00:03:13memory without me telling it. So it's changed the length from 210 to 400 and has changed the style
00:03:19of tweets that I want. Let's see if I can actually generate a decent tweak from my latest script.
00:03:23And it has come up with a pretty decent first attempt, but there are a few things that I won't
00:03:28personally say like breaking a sweat and I wouldn't use the word incredible. I'd use the phrase really
00:03:34good. And after a few tweaks is come up with a tweets that I would say I'd actually use in my
00:03:39profile. And it's safe that all to memory. I'm going to prompt it to create a skill. So it's easier for
00:03:44me to write tweets in the future. And now it sees the skill manager skill to go ahead and create a
00:03:49skill. Let's see this in action. And look at that it's written a tweet for me with multiple options,
00:03:54and I can select one I like the most. It's even gone ahead and created a thread that I can use
00:03:59to write multiple tweets if I wanted to. So technically, because it's remembered everything,
00:04:04if I create a brand new Hermes session, change the model from the default and ask it if it knows how
00:04:09I like to write my tweets, it comes back with a response telling me exactly how I like to write
00:04:14my tweets, even down to the type of emojis I like to use. Now you may be wondering how this Hermes
00:04:19able to pull all this information from memory without burning through your tokens. Well,
00:04:24memory is stored in an external file. So your memory.md file or an external processor like
00:04:30super memory, mem0 or open viking if you configure it. And memory is preloaded each session or pre-fetch.
00:04:38But it's not the full thing. In fact, it's a compacted version that's limited to about three
00:04:43and a half thousand characters, which is roughly 700 tokens depending on the model. But all sessions
00:04:49are stored inside an SQLite database using FTS5 for full text search. So if you ask Hermes to remember
00:04:56what you said yesterday, it will go into the database, do the search and give you that
00:05:01information. It also does something a bit weird. It compresses your session above 50% context window,
00:05:06which is different from something like Claude code, which does it at 80%. But I guess it's difficult to
00:05:11tell a good measure depending on the model. So 50% is a good rough number. But what it does is instead
00:05:17of just compressing the whole thing, it removes the output from old tool calls and keeps the head
00:05:23and tail of the session, but compresses the middle. This is what actually gets saved in the SQLite
00:05:28database, not the full conversation itself. It also nudges itself every 10 or so turns to save important
00:05:35information to memory and also to write a skill whenever that's necessary. Now I know it's very
00:05:39difficult to see the full power of Hermes in this very short demo session that I gave it, but hopefully
00:05:44you can kind of extrapolate how well it will remember and create skills based on the information
00:05:50you give it. And actually I'm going to be using it more often. So this month or maybe the month
00:05:54afterwards, I'm going to focus on using Hermes as my main personal assistant with a very cheap model
00:05:59like GLM and I'll let you know how it goes. But as usual, let me know your thoughts in the comments.
00:06:04Again, don't forget to subscribe and until next time, happy coding.

Key Takeaway

Hermes creates a self-improving feedback loop by automatically extracting skills and compacting historical data into a persistent 3,500-character memory file to maintain long-term context without excessive token costs.

Highlights

Hermes is an open-source AI agent that builds its own skills and automatically manages a long-term memory system to improve performance over time.

The system stores persistent memory in external files like memory.md or services such as Mem0, pre-loading a compacted 700-token summary at the start of each session.

Full session histories are archived in an SQLite database using FTS5 for efficient full-text searches of past interactions.

Context window management triggers at 50% capacity by removing old tool outputs and compressing the middle of the conversation while preserving the head and tail.

Users can swap LLM models mid-session using a simple slash command to balance speed, cost, and reasoning quality.

The agent self-nudges every 10 turns to identify and extract important information for permanent storage or new skill creation.

Timeline

Core Capabilities and Comparison

  • Hermes is a self-evolving AI agent developed by News Research that learns from user feedback and creates its own tools.
  • The agent serves as an alternative to OpenClaw and NanoClaw for users seeking local execution or independence from specific subscription rules.

The system distinguishes itself through autonomous evolution where every interaction contributes to a permanent knowledge base. It targets developers who find existing agent SDKs too restrictive or feature-heavy for specialized tasks like content creation and writing style mimicry.

Installation and Environment Setup

  • Installation requires a single CLI command and supports various backend models including Gemma 2 or GLM-4 via OpenRouter.
  • The agent runs locally or on a Virtual Private Server (VPS) to ensure data privacy and environment control.

The setup process mirrors the workflow of OpenClaw, allowing users to connect to local hardware or cloud-based model providers. Initial execution displays available tools and prompts the user to start a session that can either be fresh or a continuation of previous work.

Style Adaptation and Model Swapping

  • Hermes analyzes local file directories to identify specific writing patterns, such as a pragmatic or developer-centric tone.
  • The model slash command enables immediate switching between different LLMs within a single active session.

During a test run, the agent analyzed video scripts to accurately identify a transparent and relatable brand voice. When high context usage slowed the response time, switching to a faster model like GLM-4 Turbo allowed the task to proceed without losing the established stylistic parameters.

Skill Creation and Autonomous Memory

  • The Skill Manager tool converts repeated feedback and instructions into permanent executable skills for future use.
  • New sessions retain specific preferences like emoji usage and character counts even after the original conversation ends.

After refining a tweet draft, the agent saved the final preferences to its memory and generated a reusable skill for future tweet threads. This eliminates the need for repetitive prompting, as the agent recalls specific vocabulary constraints and formatting rules across separate sessions.

Technical Architecture of Memory and Compression

  • Memory management utilizes a compacted version of the memory.md file, limited to roughly 3,500 characters to optimize token usage.
  • Context compression occurs at 50% window utilization by purging old tool call results while retaining the start and end of the dialogue.
  • An SQLite database provides the backbone for long-term storage, enabling the agent to retrieve facts from previous days through full-text search.

To prevent context bloat, Hermes employs a specialized compression logic that is more aggressive than competing tools like Claude Code. By pre-fetching a 700-token summary and utilizing an external database for deep retrieval, the agent maintains high relevance without hitting token limits.

Community Posts

View all posts