The Agent Loop and Skills: How OpenClaw Actually Thinks

Understanding the Core Architecture, Memory Management, and Skill Extensibility

Feb 21, 2026

I spent an afternoon trying to break the skills in various AI agents. I wrote a deliberately vague skill definition, gave it a poorly described tool dependency, and asked the agent to do something the instructions didn’t quite cover. It mostly figured it out. That surprised me more than I expected it to.

Let me tell you what I think is actually going on here — and where the magic is real versus where it’s just well-placed scaffolding.

The agent loop — the part where OpenClaw calls the LLM, gets tool requests, executes them, and loops — is not the hardest part. Pi Agent Core handles the core loop cleanly. But don’t mistake “well-abstracted” for “trivial”: streaming partial tool results, handling backpressure across WebSocket connections, and correctly injecting tool results back into the model turn are all places where implementations quietly break.

The harder part, as one architecture writeup put it, is “everything around it.” Channel normalization. Session routing. Memory that survives context window limits. Skills that non-engineers can write. Security that doesn’t require a PhD.

This article covers OpenClaw’s answers to all of those.

The Agent Loop in Detail

Every incoming message follows the same path:

Diagram 1 — Message intake to session routing:

Diagram 2 — The agentic tool loop (repeats until no tool calls):

Diagram 3 — Persistence after every response:

Steps in diagram 2 (call → tool → result → call) repeat in a loop. This is what makes it “agentic” — it doesn’t just answer, it acts, observes, and acts again until the job is done.

The thinking level is configurable per session. OpenClaw exposes Claude’s extended thinking as off / minimal / low / medium / high / xhigh. Higher thinking levels mean the model spends more tokens reasoning before responding — useful for complex tasks, wasteful for simple questions. You can set a default and override per-session with /thinking high.

How the System Prompt Gets Built

This is where I think OpenClaw is doing something genuinely clever. Before the LLM call, the Gateway assembles the system prompt from multiple files in the workspace:

System prompt assembly (what files build the context):

These are Markdown files you can read and edit. Want the agent to always respond in bullet points? Edit SOUL.md. Want it to remember you prefer concise answers? Add it to MEMORY.md. Want to disable a routing rule? Edit AGENTS.md.

This “context as files” design is one of OpenClaw’s most important choices. It makes the agent’s behavior transparent and inspectable in a way that cloud AI assistants fundamentally cannot be. I’ve read my own SOUL.md more times than I’ve read the system prompt of any other AI tool I use — because I actually can.

Memory: The Part That Actually Works

Context windows are finite. Claude’s context window is large, but a 24/7 agent that processes hundreds of messages will hit limits eventually. OpenClaw’s memory architecture is designed to handle this gracefully, and honestly I was skeptical it would work. It does.

Two-tier memory architecture:

The daily note is an ephemeral scratch pad. New facts worth keeping get promoted to MEMORY.md — either by the agent (via tool call) or by you directly (by editing the file).

Context compaction is what happens when the conversation grows too long:

The system detects the context window is approaching capacity
Before pruning, it runs a “silent pre-compaction memory flush”: prompts the model to extract key facts from the conversation and write them to MEMORY.md
The oldest conversation history is summarized and replaced with a compact summary
The summary is prepended to the pruned history
The session continues

The model never “forgets” important information — it migrates it to persistent storage before discarding the raw history.

You can also trigger compaction manually with /compact, which runs an explicit summarization of the current session.

Memory search: The memory manager uses a hybrid approach — vector embeddings plus BM25 (keyword) search — over workspace Markdown files and session transcripts. This matters because neither approach alone is reliable: pure vector search misses exact strings (names, IDs, commands), while pure keyword search can’t surface semantically related memories with different wording. The hybrid finds “that GitHub repo we discussed in December” even if the query is “the open-source project you recommended.” The agent can retrieve relevant memories from months ago without them all being in the active context window.

What surprised me most about the memory system was that it’s not magic — it’s just well-organized files. Everything the agent “knows” about you is in MEMORY.md. Open it and read it. You’ll probably find it unsettling how much you’ve told it without realizing.

The Skills System: Where It Gets Interesting

Skills are the extensibility layer. Every capability beyond the core tools (browser, file, shell) is a skill.

A skill is a folder containing a SKILL.md file:

~/.openclaw/workspace/skills/
└── daily-standup/
    └── SKILL.md

The SKILL.md file has YAML frontmatter and natural-language instructions:

---
name: daily-standup
description: Collects and posts daily standup updates
version: 1.0.0
os: ["linux", "macos", "windows"]
requires:
  - slack
triggers:
  - cron: "0 9 * * 1-5"
---

## Instructions

At 9 AM Monday through Friday:

1. Send the user a message: "Time for your standup! What did you work on
   yesterday? What are you doing today? Any blockers?"
2. Wait for their reply.
3. Format the response as a Slack standup post with three sections:
   Yesterday, Today, Blockers.
4. Post to the #standup channel.
5. Confirm to the user that it was posted.

That’s the entire skill. The agent reads the SKILL.md, understands the intent, and executes it using the Slack tool that’s already available.

Skill Loading Precedence

Skill loading priority (highest wins):

Skills load from three sources in priority order. This means you can override any bundled or managed skill by placing a same-named skill in your workspace. No patching required.

OS gating: Skills declare OS requirements in the frontmatter. On a Linux Gateway, macOS-only skills (like those using Spotlight or AppleScript) don’t appear in the system prompt and aren’t offered to the agent. Three workarounds exist if you need macOS tools from Linux: run the Gateway on macOS, use a paired macOS node for RPC execution, or proxy the binaries over SSH.

ClawHub: The Skill Registry

ClawHub is the community skill registry. The agent can search it directly:

You: "find me a skill that can manage my GitHub issues"
Agent: Searching ClawHub... found github-issues v2.1. Install it?
You: yes
Agent: Installed. I can now manage your GitHub issues.

Skills install to ~/.openclaw/skills/. No restart required.

Security note: Research found 341 malicious skills on ClawHub in early 2026, representing a 12% contamination rate. OpenClaw partnered with VirusTotal for automated scanning after this discovery. Review skills before installing them from community sources, particularly those requesting broad system access. The SKILL.md is readable Markdown — always check what tools it uses before saying yes.

Self-Modifying Skills

One of OpenClaw’s more striking features: the agent can write its own skills.

If you ask it to “create a skill that monitors my inbox and flags urgent emails,” a real interaction looks like this:

You: Create a skill that monitors my Gmail for urgent emails and
     pings me on Telegram if one arrives.

Agent: I'll create that skill. Let me check what tools are available
       for Gmail and notifications...

       I'll use gmail.watch (Pub/Sub listener) and telegram.send.
       Here's what I'm writing to workspace/skills/urgent-email-monitor/SKILL.md:

       ---
       name: urgent-email-monitor
       description: Monitors Gmail for urgent emails and alerts via Telegram
       version: 1.0.0
       requires:
         - gmail
         - telegram
       triggers:
         - gmail.pubsub: "urgent OR URGENT OR important"
       ---

       When triggered by an incoming Gmail message matching the filter:
       1. Read the email subject and sender.
       2. Determine if it is genuinely urgent (not newsletter, not CC).
       3. If urgent, send a Telegram message: "Urgent email from [sender]: [subject]"
       4. Mark the email as important in Gmail.

       Saved and hot-reloaded. The skill is active — I'll watch your inbox
       and ping you on Telegram for anything urgent.

No compilation. No deployment step. No SDK. The agent writes the SKILL.md, saves it, and the Gateway’s file watcher picks it up immediately. The agent can extend its own capabilities within the same conversation turn.

The limitation: skills written this way are only as reliable as the agent’s understanding of available tools and the TOOLS.md schema. Prompt the agent to explain what it’s doing before it writes the file if you’re unsure. I’ve had it write skills that almost worked but had a subtle misunderstanding of the trigger format. Ask it to walk you through the logic first.

How Tools Know What They Can Do

Every tool in the agent’s context comes with a schema entry in TOOLS.md. This isn’t a separate API spec — it’s another Markdown file in the workspace:

## browser.navigate
Navigate the browser to a URL.
Parameters: url (string, required)
Returns: page title, current URL

## file.read
Read a file from the workspace.
Parameters: path (string, required)
Returns: file contents as string

## system.run
Execute a shell command.
Parameters: command (string, required)
Requires: elevated session
Returns: stdout, stderr, exit code

The agent reads TOOLS.md before each session. It knows what tools exist, what they do, and what permissions they require. When it writes a skill that uses system.run, it knows to include a note that the user needs to enable /elevated on first.

This is the “agent reads source code before using tools” principle that makes OpenClaw qualitatively different from agents that just get a tool schema. The agent understands mechanism, not just interface.

Extended Thinking and Verbosity

Two per-session parameters control how much cognitive work the agent does and how much it shows you:

Thinking levels (/thinking <level>):

off — no extended thinking, fastest
minimal, low, medium, high, xhigh — progressively more reasoning tokens
Claude models only; other providers ignore this setting

Verbosity levels (/verbose <level>):

Controls how much intermediate reasoning the agent shares in its response
Higher verbosity = see the agent’s step-by-step reasoning
Lower verbosity = concise final answers only

For a “what’s 2+2” question, thinking: off, verbose: low is correct. For “analyze this codebase and find security issues,” thinking: high, verbose: medium is better.

Agent-to-Agent Coordination

OpenClaw supports multiple agents on the same Gateway. The cross-session tools enable agents to talk to each other:

sessions_list — see what other agents/sessions exist
sessions_history <session> — read another session’s transcript
sessions_send <session> <message> — send a message to another agent

Agent-to-agent coordination:

Combined with AGENTS.md routing, this enables “one CEO, many specialists” patterns. The personal agent orchestrates the research agent without the user having to switch contexts or know the research agent exists.

The skills system and memory architecture are what turn a capable AI API into a personal agent. Anyone can call Claude’s API. The hard part is building the scaffolding that makes the LLM know who you are, remember what you care about, and extend its own capabilities as your needs evolve. That’s what OpenClaw built.

Practiceoverflow

Discussion about this post

Ready for more?