Runbooks & Skills

Voxeltron's AI agent uses a 3-layer architecture to handle everything from single commands to complex multi-step operational procedures — with built-in approval gates, rollback support, and real-time streaming progress in the TUI.

The 3-layer architecture: Tools are atomic actions (query logs, restart a container). Skills compose multiple tools into a diagnostic or remediation workflow. Runbooks are full operational procedures with approval gates, rollback steps, and structured output — the kind of thing you'd hand to an on-call engineer at 3am.

Built-in Tools

10 tools are available out of the box. Each tool is an atomic operation the AI agent can invoke:

Tool Description
query_logs Search and filter application logs
query_metrics Query CPU, memory, network, and custom metrics
list_alerts List active and recent alerts
project_status Get project health and deployment status
list_deployments List deployment history with metadata
container_action Start, stop, or restart containers
scale_service Scale a service up or down
rollback Roll back to a previous deployment
create_backup Create a database or volume backup
rotate_secret Rotate secrets and credentials

Built-in Skills

Skills compose multiple tools into higher-level diagnostic and remediation workflows. Each skill runs a series of steps, collecting data and making decisions along the way.

Skill Description Steps
health-check Comprehensive service health assessment — checks metrics, logs, alerts, and deployment state 5
deployment-analysis Analyzes a deployment for regressions, error spikes, and performance changes 6
log-investigation Investigates error patterns, correlates across services, and identifies root cause 7
cost-optimizer Analyzes resource usage and recommends rightsizing and scaling adjustments 4

Built-in Runbooks

Runbooks are full operational procedures with approval gates and rollback support. They encode the kind of multi-step remediation an experienced SRE would follow.

Runbook Category Severity Steps
disk-cleanup Infrastructure Warning 5
restart-loop-detection Application Critical 7
certificate-expiry-renewal Security High 6

Streaming Progress

When a skill or runbook executes, the TUI displays real-time step-by-step progress. Each step streams its status as it transitions through pending, running, completed, or failed:

Running runbook: restart-loop-detection
────────────────────────────────────────────

[1/7] Identify crash-looping containers .......... done
[2/7] Collect recent logs from affected pods ...... done
[3/7] Analyze crash patterns and exit codes ....... done
[4/7] Check resource limits and OOM events ........ running
[5/7] Correlate with recent deployments ........... pending
[6/7] Apply remediation ........................... pending
[7/7] Verify service stability .................... pending

Progress updates arrive via the gRPC streaming connection between the TUI and daemon, so you see each step the moment it begins — no polling, no waiting for the entire runbook to finish.

Approval Gates

Runbook steps that modify infrastructure require explicit approval before execution. When a step has approval: required, the TUI pauses and prompts for confirmation:

[3/5] Identified 4.2 GB of reclaimable space in /var/log

[4/5] Delete old log files and reclaim disk space
      This will remove 847 files older than 7 days.

      Approve this step? [y] yes  [n] skip

Press y to approve and continue, or n to skip the step. Skipped steps are logged but do not trigger a rollback. If a required step fails after approval, the runbook automatically executes its rollback steps in reverse order.

Steps with approval: auto execute without prompting — use this only for read-only or non-destructive operations like querying logs or checking status.

Custom Runbooks

Runbooks are defined in YAML and can be added to your project or shared across your organization. Each runbook specifies its steps, tool invocations, approval requirements, and rollback procedures:

# ~/.config/voxeltron/runbooks/my-runbook.yaml
id: my-runbook
name: My Custom Runbook
category: infrastructure
steps:
  - id: step-1
    name: Check status
    tool_id: project_status
    approval: auto
  - id: step-2
    name: Restart service
    tool_id: container_action
    approval: required
    parameters:
      action: restart
rollback_steps:
  - id: rb-1
    name: Revert restart
    tool_id: container_action
    parameters:
      action: restart

Place custom runbook files in ~/.config/voxeltron/runbooks/ or in your project's .voxeltron/runbooks/ directory. The daemon discovers them at startup and makes them available to the AI agent and TUI.