AI agents are moving beyond simple command-line tools into systems that can plan, schedule, call tools, and run automated workflows. Nous Research’s Hermes Agent framework offers a self-hosted runtime for building advanced agents with state management, tool integration, and secure execution.

It supports multi-step planning, background task control, and real-world automation beyond single-purpose coding assistants. In this article, we explore Hermes Agent’s architecture, setup, security model, and practical examples for building reliable AI agent workflows.

Hermes is not just a prompt wrapper: it is an open-source agent runtime with multiple entry points, including a CLI, API server, and messaging gateway. It combines browser automation, terminal execution, file operations, memory, skills, and scheduling to support a wide range of real-world automation workflows.

Its layered architecture separates concerns and keeps the system manageable. User requests enter through the CLI or API, then move into the agent core, which generates prompts, calls the language model, runs tools, handles retries, and can fall back to alternate models when needed. This makes Hermes more resilient to rate limits, server errors, and authentication issues.

The diagram below combines the official architecture, agent loop, session storage, and tools runtime documentation.

Hermes shows its strength inside the agent turn loop. It runs one call per tool, but when the model requests multiple tools, Hermes executes them in parallel through a thread pool, speeding up complex workflows. It also manages the model context window by compressing conversations once they exceed 50% of the available context, while preserving recent messages and grouping related tool calls and results logically.

State management is handled through a local SQLite database with full-text search, allowing the agent to revisit past sessions and retrieve relevant context. Long-term memory is stored in two Markdown files: MEMORY.md

for general facts and USER.md

for user-specific preferences. Hermes also supports skills as procedural memory, letting agents create, update, and remove workflows over time.

Since Hermes is evolving quickly, tool counts and details may vary across documentation pages. For serious use, pin the Hermes version to keep results repeatable and avoid breaking configurations.

Hermes offers a clean, single-line installer. Note, native Windows is not supported. Use WSL2 for Windows users. All that is required is the software Git. The correct versions of Python, Node.js and other necessary command-line tools are automatically installed.

Linux / macOS / WSL2 / Android (Termux)

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Reload your shell

source ~/.bashrc # or source ~/.zshrc

Choose your model/provider interactively

hermes model

In this blog we will set up Ollama local model inside the hermes agent

Diagnose setup if needed

hermes doctor

Let’s test the agent type the following in terminal

hermes chat

One of the best design decisions made in Hermes is in regard to configuration management. It utilizes two different files. Secrets, such as API keys, are placed inside of ./.hermes/.env. Non-secret settings are stored in ~/.hermes/config.yaml. This separation is a best practice in securing. Values are automatically inserted in the proper file by the hermes config set command.

Use a conservative profile to ensure a safe and repeatable setup. The following setup could be used to allow manual approval of sensitive actions, execute terminal commands in a container with sandboxing, and prevent use of private network addresses.

If you want to set up LLM from another provider, first create the secrets file. This enables the API server and configures API keys for your chosen LLM provider and a cloud browser service.

Secrets and service toggles in ~/.hermes/.env

cat > ~/.hermes/.env <<'EOF'

OPENROUTER_API_KEY=replace-me

BROWSERBASE_API_KEY=replace-me

BROWSERBASE_PROJECT_ID=replace-me

API_SERVER_ENABLED=true

API_SERVER_KEY=replace-me-local-dev

EOF

Then, a main configuration file is created. The following example is based on a Docker backend for the terminal that will allow code to be executed in a secure and separated environment. It is the recommended solution for any serious self-hosted automation.

Main settings in ~/.hermes/config.yaml

model: anthropic/claude-3-5-sonnet-20240620 # Replace with your provider/model

terminal:

backend: docker

docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"

container_persistent: true

browser:

inactivity_timeout: 120

memory:

memory_enabled: true

user_profile_enabled: true

approvals:

mode: manual

security:

allow_private_urls: false

display:

streaming: true

Hermes is model-agnostic. Use an API from an API provider such as Anthropic or OpenAI, or connect to an API routing service such as OpenRouter or a self-hosted API that is OpenAI-compatible. For the purposes of this article we are using a specific model and it is important to note that this can be extended to any provider model you would like to use.

Now, let’s explore the practical capabilities of the Hermes Agent. These tutorials demonstrate core features that enable complex, autonomous workflows.

Hermes includes a real cron subsystem for scheduled tasks. You can create recurring jobs using plain language. These jobs can run scripts, summarize files, or perform other actions. Results can be delivered to your chat, saved to a file, or sent to other platforms. The agent manages these jobs through its cronjob tool.

For example, you can start a chat session and give it a scheduled task.

Input: “Every weekday at 08:30, read ~/reports/daily_sales.csv, summarise anomalies, and send the result to my home channel.”

Hermes will create a job and schedule its next run. You can then inspect and manage your jobs from the command line.

Inspect and manage jobs from the CLI

hermes cron list

hermes cron status

hermes cron run <job_id>

hermes cron pause <job_id>

To prevent runaway loops, Hermes enforces an important safety constraint. A session started by a cron job cannot create new cron jobs. If you try, the agent will block the action. This demonstrates the framework’s focus on stable, reliable automation.

The browser tooling in Hermes is powerful. It supports cloud browser providers like Browserbase and can also control a local Chrome or Chromium instance. Instead of just fetching raw HTML, Hermes represents web pages as accessibility trees. This structured format makes it easier for a language model to navigate and interact with page elements.

Let’s try a simple research task. This prompt asks the agent to navigate a website, find information, and summarize an article.

Input: “Open https://news.ycombinator.com list the top 5 stories, click the first one, then summarise the article’s core claim and any obvious caveats.”

This task showcases the agent’s ability to perform multi-step web interactions. It also provides an opportunity to test its security features. If by default, the configuration blocks access to private URLs. If you ask the agent to open a local address like http://localhost:3000 it should refuse the request.

Failure Mode Input: “Open http://localhost:3000 and take a screenshot of the dashboard.”

With allow_private_urls

set to false, Hermes will block this action to prevent a potential Server-Side Request Forgery (SSRF) attack. However, Hermes has a smart solution for developers who need to work with both public sites and local applications. It can be configured to automatically route private URLs to a local browser while sending public URLs to the cloud provider. This is a strong production feature that balances security and convenience.

Hermes uses its memory files, MEMORY.md

and USER.md

, to retain information across sessions. These files are injected into the system prompt when a new session starts. This gives the agent consistent context about your preferences and ongoing projects. It is a Self Improving agent it saves the user preferences and improve it over time.

Here is a simple conversation to test its memory.

Turn 1: “Remember that I want CSV outputs, British English, and concise executive summaries.”

Turn 2: “Also remember that my default project language is Python.”

After these turns, start a completely new session and ask a question to check its recall.

Fresh Session Input: “What output format, English variant, and language do I prefer?”

The agent should correctly retrieve the preferences you stored. Memory is injected at the start of a session, so a fresh session is the cleanest way to test this feature. The agent also rejects duplicate memories, so asking it to store the same fact twice is another simple way to see its internal logic at work.

For truly complex tasks, Hermes offers advanced multi-step planning tools. These include persistent goals, sub-agent delegation, and programmatic tool calls.

/goal

command. The agent will continue working on this goal across multiple turns until a judge model determines it is complete or you pause it. execute_code

tool is perhaps the most powerful feature. It allows the model to write and run a Python script that calls other Hermes tools. The script communicates with the agent over a local RPC bridge. This is highly efficient, as it can collapse a long, token-heavy sequence of tool calls into a single model turn.Consider a research task that involves searching the web, fetching several pages, and summarizing them. A typical agent might do this with a dozen back-and-forth turns with the model. With execute_code

, the model can write one script to do it all.

Example script for execute_code

from hermes_tools import web_search, web_extract

import json

results = web_search("Rust async runtime comparison 2025", limit=5)

summaries = []

for r in results["data"]["web"]:

page = web_extract([r["url"]])

for p in page.get("results", []):

if p.get("content"):

summaries.append({

"title": r["title"],

"url": r["url"],

"excerpt": p["content"][:500],

})

print(json.dumps(summaries, indent=2))

This feature is designed for heavy lifting. It has configurable limits on execution time and output size. If a script times out, the agent receives a timeout status and can decide how to proceed. This makes the agent operations layer more robust and predictable.

Hermes is designed to be integrated with other systems. It has an API server that enables any front end that supports chat-completions to integrate with it. The Python library allows you to integrate the agent into other applications. Even it is possible to make Hermes available as a Model Context Protocol (MCP) server, for other agents to use its tools.

When comparing Hermes to other tools, focus on positioning.

Hermes is not fee based, but operational. The primary expense is the model inference, cloud browser sessions, sandbox compute. These costs can be managed by Hermes using provider routing policies which can be optimized for price or latency. Also, don’t forget to plan for benchmark runs; these can be resource intensive.

Hermes Agent stands out because it combines the core pieces needed for real-world AI agents: state, routing, tooling, memory, scheduling, and evaluation hooks in one package. For self-hosted automation enthusiasts, that makes it more than a coding assistant; it becomes a serious operations layer for building useful automations.

Use it with discipline. Pin environment versions, grant only necessary privileges, and test both successful workflows and failure modes. Keep official benchmarks separate from personal results. Used carefully, Hermes can support sophisticated, reliable AI-powered systems.

A. Yes, Hermes Agent is open source under the MIT license. You may only need to pay for LLM inference, cloud tools, browsers, or hosting.

A. Yes, Hermes Agent can run on Windows through WSL2, since it is not available as a native Windows operating system application.

A. Hermes offers CLI, API, gateway, memory, scheduling, and security controls, making it broader than coding agents tied to an IDE or CLI.

Hermes Agent Guide: What is it and How to Use it?