Docs / Security

Security

Three layers of safety. Capability boundaries define what the agent can do. Tripwires monitor for unexpected behavior. Ripcord rolls back when needed.

The problem

AI agents that can read files, run commands, and browse the web are powerful. They are also dangerous. Most tools handle this with one of two approaches: either they ask for permission on every action (slow, annoying, trains users to click "allow" without reading), or they run with no guardrails at all.

Fawx takes a different approach. The agent operates freely within defined boundaries. Safety enforcement is compiled into the binary, not layered on through prompts. The agent cannot see, modify, or bypass the safety kernel.

Three-layer architecture

Layer 1

Capability boundaries

Define what the agent can and cannot do. Actions outside boundaries get an immediate structured denial. The agent sees the denial and adjusts its approach. No waiting, no modal popups.

Layer 2

Tripwire monitoring

Silent boundaries within the capability space. When the agent crosses a tripwire (e.g. modifying a sensitive file, accessing a new directory for the first time), monitoring activates and a journal entry is created. The agent never knows.

Layer 3

Ripcord rollback

Atomic rollback of file and git operations from the tripwire activation point. If something goes wrong, you can undo everything the agent did since the boundary was crossed. Shell commands and API calls are audit-logged since they cannot be reversed.

Capability mode

The default permission mode. The agent receives a clear set of capabilities at session start:

[permissions]
mode = "capability"
preset = "standard"

When the agent tries an action outside its capabilities, it receives a structured response:

{
  "denied": true,
  "action": "shell",
  "reason": "Shell execution is not permitted under the current capability set.",
  "suggestion": "Request a capability upgrade or use a different approach."
}

The agent sees this and adapts. No user intervention required. No blocking modal. No timeout.

Prompt mode (opt-in)

For users who prefer per-action approval, prompt mode is available:

[permissions]
mode = "prompt"

In this mode, restricted actions pause and wait for explicit approval through the UI. This is useful during initial setup when you want to observe what the agent does before granting broader access.

Tripwires

Tripwires are boundaries within the capability space. They do not block actions. Instead, they silently activate monitoring when crossed.

What triggers a tripwire

What happens on crossing

  1. A journal entry is created with full context: what was attempted, what files were touched, the conversation state.
  2. Monitoring escalates for the rest of the session. Subsequent actions in the same category are logged in detail.
  3. The user receives a notification (configurable: silent, banner, or interrupt).
  4. The agent continues working. It has no awareness that a tripwire was crossed.

Ripcord

When a tripwire is crossed, Fawx creates a rollback point. If you decide the agent went wrong, you can pull the ripcord:

fawx ripcord pull

This atomically reverts all file changes and git operations since the tripwire activation point.

What can be rolled back

What cannot be rolled back

For irreversible actions, the capability boundary (Layer 1) is the enforcement point. Actions that cannot be undone require explicit capability grants.

Ripcord TTL

Rollback points expire at the end of the session by default, with a 24-hour hard cap. This prevents unbounded storage growth while keeping rollback available for the duration of active work.

The kernel

All three layers are implemented in the safety kernel, a compiled Rust module that cannot be modified at runtime. The kernel:

The agent communicates with the kernel through a defined interface. It cannot inspect, modify, or bypass the kernel. This separation is architectural, enforced by the Rust type system and module boundaries.

Why compiled safety matters: Prompt-based safety rules can be overridden by clever prompt injection. A compiled safety kernel operates at a layer the model cannot reach. The rules are code, and the agent has no API to change them.

Credential security

API keys and OAuth tokens are encrypted at rest using AES-256-GCM. The encryption key is derived from a machine-specific secret. Credentials are decrypted only at the moment of use and never written to logs, conversation history, or temporary files.

← Fleet