Hermes: The Self-Improving Agent That Gets Smarter Every Day

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00OK, Hermes is an open-source AI agent created by American company News Research that is

00:00:06self-improving. So basically, the more you use it, the better it gets. It reflects,

00:00:10learns and evolves on its own, it never forgets anything you've said and it even creates its own

00:00:16skills. But is all of that enough to replace something like OpenClaw, which supports many

00:00:22more channels, has better sandboxing and is much more mature? Hit subscribe and let's get into it.

00:00:30So the name Hermes, surprise, surprise comes from the Greek messenger God. And that's also

00:00:37where this symbol comes from. You'll see more of it later on in the video. But as it stands,

00:00:42I've already made a video about OpenClaw, which is great, but has a lot of features that I won't

00:00:47personally use. And NanoClaw, which has a much smaller feature set, but is built on top of the

00:00:52Claude agent SDK, which is now less usable for me because of the weird rules around using the

00:00:59Claude subscription with third-party tools. So now I'm on the lookout for a new AI assistant

00:01:04and let's see if Hermes, the self-improving AI agent, can fill that void. I'm going to use it

00:01:09to create promotional tweets for me based on past videos that I've created. And I'm going to give it

00:01:14some scripts and directions to get to that stage. Now, this is quite a small task, but the focus is

00:01:20more to see if Hermes can remember my writing style and all the feedback I'm going to give it to create

00:01:26a tweet that I like without me asking it over and over again. Let's go. So I've already gone ahead

00:01:30and installed Hermes using this command, which is very simple and went through everything from

00:01:35collecting a model. I chose OpenRouter with Gemma 4, but if my hardware could handle it,

00:01:40I would run it locally and connect it to Hermes, messaging platforms and tools for the CLI. If you

00:01:45used OpenClaw, this whole process will feel very familiar. I've also set it up on a VPS to be on the

00:01:51safe side, but if you wanted to, you could easily install it locally on your machine. So from here,

00:01:55I'm going to write the Hermes command, which will start a new chat showing the Hermes symbol with

00:01:59the available tools and skills over here. Note, when you run the Hermes command, it creates a new

00:02:04session and doesn't resume the previous one unless you specify, just like called code. So here I'm

00:02:08going to give it a prompt. I want you to help me write tweets based on the scripts from my videos.

00:02:12Let's go through the process of doing that. After a while, it comes back with a response,

00:02:16which I like the structure of. And so I'm going to give it a follow-up prompt. I have scripts inside

00:02:21the scripts folder, study them to understand my writing style and voice. I've also given it my

00:02:25target audience and the length I'd like my tweets to be. So now it's using some tools to search

00:02:30through my files and after a while it analyses my script to give me a breakdown of my style.

00:02:34So it says I'm pragmatic and sceptical, which is true. I'm developer centric and I'm transparent

00:02:40and relatable. It's also come up with a strategy for my target audience, which I like the look of.

00:02:45But I've changed my mind. Even though I did say I wanted the tweets to be around 210 characters,

00:02:50I actually want them to be a bit longer. So I'm going to give it a new prompt. And I have noticed

00:02:54it's been taking a while and using a lot of context. So what I can do is change the model mid

00:02:59session by running the model slash command and specifying the model I want. In this case, I want

00:03:04GLM five turbo. So now it switched to that model. I'm going to give it a new prompt to make the

00:03:08tweets longer. And it comes back with response much faster, but has also added a lot of information to

00:03:13memory without me telling it. So it's changed the length from 210 to 400 and has changed the style

00:03:19of tweets that I want. Let's see if I can actually generate a decent tweak from my latest script.

00:03:23And it has come up with a pretty decent first attempt, but there are a few things that I won't

00:03:28personally say like breaking a sweat and I wouldn't use the word incredible. I'd use the phrase really

00:03:34good. And after a few tweaks is come up with a tweets that I would say I'd actually use in my

00:03:39profile. And it's safe that all to memory. I'm going to prompt it to create a skill. So it's easier for

00:03:44me to write tweets in the future. And now it sees the skill manager skill to go ahead and create a

00:03:49skill. Let's see this in action. And look at that it's written a tweet for me with multiple options,

00:03:54and I can select one I like the most. It's even gone ahead and created a thread that I can use

00:03:59to write multiple tweets if I wanted to. So technically, because it's remembered everything,

00:04:04if I create a brand new Hermes session, change the model from the default and ask it if it knows how

00:04:09I like to write my tweets, it comes back with a response telling me exactly how I like to write

00:04:14my tweets, even down to the type of emojis I like to use. Now you may be wondering how this Hermes

00:04:19able to pull all this information from memory without burning through your tokens. Well,

00:04:24memory is stored in an external file. So your memory.md file or an external processor like

00:04:30super memory, mem0 or open viking if you configure it. And memory is preloaded each session or pre-fetch.

00:04:38But it's not the full thing. In fact, it's a compacted version that's limited to about three

00:04:43and a half thousand characters, which is roughly 700 tokens depending on the model. But all sessions

00:04:49are stored inside an SQLite database using FTS5 for full text search. So if you ask Hermes to remember

00:04:56what you said yesterday, it will go into the database, do the search and give you that

00:05:01information. It also does something a bit weird. It compresses your session above 50% context window,

00:05:06which is different from something like Claude code, which does it at 80%. But I guess it's difficult to

00:05:11tell a good measure depending on the model. So 50% is a good rough number. But what it does is instead

00:05:17of just compressing the whole thing, it removes the output from old tool calls and keeps the head

00:05:23and tail of the session, but compresses the middle. This is what actually gets saved in the SQLite

00:05:28database, not the full conversation itself. It also nudges itself every 10 or so turns to save important

00:05:35information to memory and also to write a skill whenever that's necessary. Now I know it's very

00:05:39difficult to see the full power of Hermes in this very short demo session that I gave it, but hopefully

00:05:44you can kind of extrapolate how well it will remember and create skills based on the information

00:05:50you give it. And actually I'm going to be using it more often. So this month or maybe the month

00:05:54afterwards, I'm going to focus on using Hermes as my main personal assistant with a very cheap model

00:05:59like GLM and I'll let you know how it goes. But as usual, let me know your thoughts in the comments.

00:06:04Again, don't forget to subscribe and until next time, happy coding.

Key Takeaway

Hermes creates a self-improving feedback loop by automatically extracting skills and compacting historical data into a persistent 3,500-character memory file to maintain long-term context without excessive token costs.

Highlights

Hermes is an open-source AI agent that builds its own skills and automatically manages a long-term memory system to improve performance over time.
The system stores persistent memory in external files like memory.md or services such as Mem0, pre-loading a compacted 700-token summary at the start of each session.
Full session histories are archived in an SQLite database using FTS5 for efficient full-text searches of past interactions.
Context window management triggers at 50% capacity by removing old tool outputs and compressing the middle of the conversation while preserving the head and tail.
Users can swap LLM models mid-session using a simple slash command to balance speed, cost, and reasoning quality.
The agent self-nudges every 10 turns to identify and extract important information for permanent storage or new skill creation.

Timeline

Core Capabilities and Comparison

Hermes is a self-evolving AI agent developed by News Research that learns from user feedback and creates its own tools.
The agent serves as an alternative to OpenClaw and NanoClaw for users seeking local execution or independence from specific subscription rules.

The system distinguishes itself through autonomous evolution where every interaction contributes to a permanent knowledge base. It targets developers who find existing agent SDKs too restrictive or feature-heavy for specialized tasks like content creation and writing style mimicry.

Installation and Environment Setup

Installation requires a single CLI command and supports various backend models including Gemma 2 or GLM-4 via OpenRouter.
The agent runs locally or on a Virtual Private Server (VPS) to ensure data privacy and environment control.

The setup process mirrors the workflow of OpenClaw, allowing users to connect to local hardware or cloud-based model providers. Initial execution displays available tools and prompts the user to start a session that can either be fresh or a continuation of previous work.

Style Adaptation and Model Swapping

Hermes analyzes local file directories to identify specific writing patterns, such as a pragmatic or developer-centric tone.
The model slash command enables immediate switching between different LLMs within a single active session.

During a test run, the agent analyzed video scripts to accurately identify a transparent and relatable brand voice. When high context usage slowed the response time, switching to a faster model like GLM-4 Turbo allowed the task to proceed without losing the established stylistic parameters.

Skill Creation and Autonomous Memory

The Skill Manager tool converts repeated feedback and instructions into permanent executable skills for future use.
New sessions retain specific preferences like emoji usage and character counts even after the original conversation ends.

After refining a tweet draft, the agent saved the final preferences to its memory and generated a reusable skill for future tweet threads. This eliminates the need for repetitive prompting, as the agent recalls specific vocabulary constraints and formatting rules across separate sessions.

Technical Architecture of Memory and Compression

Memory management utilizes a compacted version of the memory.md file, limited to roughly 3,500 characters to optimize token usage.
Context compression occurs at 50% window utilization by purging old tool call results while retaining the start and end of the dialogue.
An SQLite database provides the backbone for long-term storage, enabling the agent to retrieve facts from previous days through full-text search.

To prevent context bloat, Hermes employs a specialized compression logic that is more aggressive than competing tools like Claude Code. By pre-fetching a 700-token summary and utilizing an external database for deep retrieval, the agent maintains high relevance without hitting token limits.

Community Posts

Write about this video