Documentation — avacli

Getting Started

Install the client and send your first message

Install via APT (Debian / Ubuntu)

Add the signing key, source list, and install in one flow:

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://packages.avalynn.ai/key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/avalynn.gpg
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/avalynn.gpg] https://packages.avalynn.ai stable main" | sudo tee /etc/apt/sources.list.d/avalynn.list
sudo apt update && sudo apt install avacli

Install from tarball

Download the tarball for your architecture from the Download page, then extract and run the installer:

tar xzf avacli-latest-linux-x86_64.tar.gz
cd avacli-*
sudo ./install.sh

Connect xAI

Open-source avacli talks to the xAI API directly. Add your key once via CLI or the Settings page in the web UI:

API key avacli --set-api-key xai-YOUR_KEY

Stored under ~/.avacli/config.json. Alternatively set XAI_API_KEY in the environment.

Start the web UI

Launch the local agent interface in your browser:

Web UI avacli serve

Default bind http://0.0.0.0:8080 (override with --host / --port). Run avacli --generate-master-key once to set the admin password.

CLI chat

Send a one-shot prompt without starting the web UI:

Chat avacli chat "hello"

Sends the message and prints the agent response to stdout (requires API key).

Client Configuration

Modes, keys, workspace, and model selection

Web UI configuration

Most settings can also be configured through the built-in web UI at http://localhost:8080 (or your chosen port). Open the Settings page in the sidebar to set your API key, choose a model, configure billing, and change the workspace directory — no CLI flags required.

Direct xAI usage

avacli always uses the xAI API with your own key (CLI, environment, or Settings in the web UI). It does not proxy chat through the Avalynn platform or consume platform tokens.

Setting API keys

Ways to provide your xAI key to avacli:

xAI key avacli --set-api-key xai-xxxxxxxxxxxx

Saves to ~/.avacli/config.json.

Environment export XAI_API_KEY=xai-xxxxxxxxxxxx

Takes precedence when set.

Web UI Settings → API key

Persisted under ~/.avacli/ alongside other preferences.

Workspace directory

Point the agent at a specific project directory for file operations and code analysis:

Workspace avacli --workspace /home/user/my-project serve

Short form: -d /path. The agent scopes file reads and writes to this directory.

Port configuration

Override the default web UI port (8080):

Custom port avacli serve --port 9090

The web UI will be available at http://localhost:9090 (or your host bind).

Model selection

Choose which xAI model to use for completions:

Model avacli --model grok-4-1-reasoning serve

Use avacli --list-models to see all available models.

Agent Capabilities

What the agent can do

File operations

The agent can read, write, and search files within the configured workspace. It respects .gitignore patterns and applies size limits to prevent reading excessively large files.

Code analysis & project awareness

When pointed at a project directory, the agent indexes the file tree and understands project structure, dependencies, and language-specific patterns. It can navigate codebases, find definitions, and suggest contextual edits.

Streaming responses with reasoning

Responses stream token-by-token in real time. When the model uses extended thinking, the reasoning trace is displayed in a collapsible block above the final answer so you can follow the agent's chain of thought.

Tool execution pipeline

The agent follows a tool-use loop: it receives your message, decides which tools to invoke (file read, file write, search, shell commands), executes them, and feeds the results back into the model for the next step. This continues until the task is complete or the model decides no more tool calls are needed.

Available tools (50+)

The agent has access to the following tools, grouped by category. Availability depends on the active mode — see the mode table below.

Files, search, & memory (Question / Plan modes):

Tool	Description
`read_file`	Read the contents of a file (optionally by line range).
`write_file`	Create a new file or overwrite an existing one.
`edit_file`	Apply search-and-replace edits to a file.
`undo_edit`	Revert the last edit made to a file.
`list_directory` / `glob_files`	List or glob files in the workspace.
`search_files`	Search file contents for a pattern.
`read_url` / `web_search` / `x_search`	Fetch URLs or search the web / X in real time via xAI.
`add_memory` / `search_memory` / `forget_memory`	Persistent session memory across turns.
`read_todos` / `create_update_todo`	Manage the agent’s to-do list.
`read_project_notes`	Read `GROK_NOTES.md` + session memory for context.
`ask_user`	Prompt the user for clarification or confirmation.

Shell, tests, & media (Agent mode):

Tool	Description
`run_shell`	Execute a shell command (non-interactive, configurable timeout).
`run_tests`	Auto-detect and run the project’s test suite.
`generate_image` / `edit_image`	Generate or edit images with Grok Imagine.
`generate_video` / `edit_video` / `extend_video`	Generate, edit, or extend videos.
`create_tool` / `modify_tool` / `delete_tool`	Tool Forge: build custom tools at runtime.
`research_api` / `setup_api` / `call_api`	Research, configure, and invoke third-party HTTP APIs.
`vault_list` / `vault_store` / `vault_retrieve` / `vault_remove`	Encrypted secrets vault for API keys and credentials.
`summarize_and_new_chat`	Compress the session and start a fresh chat with continuity.

Database, knowledge base, & apps:

Tool	Description
`db_query`	Read from the main agent database.
`save_article` / `search_articles`	Persistent knowledge base.
`create_app` / `edit_app_file` / `list_apps`	Generate static or dynamic web apps served from `/apps/<slug>/`.
`app_token`	Return the `agent_token` for a given app (for manual SDK calls).
`app_db_execute`	Seed or mutate data in an app’s private SQLite DB.
`app_db_set_cap`	Set a per-app DB size cap (MB), overriding the global default.

Services & sub-agents:

Tool	Description
`create_service` / `manage_service` / `delete_service` / `list_services`	Manage long-running services (tick-based or `type:"process"`).
`tail_service_logs`	Fetch the most recent log lines from any service (Question mode, read-only).
`spawn_subagent`	Delegate a bounded task to a child agent with `allowed_paths` and `allowed_tools`.
`wait_subagent`	Block on a child task’s completion (with timeout).
`cancel_subagent` / `list_subagents`	Cancel or list delegated tasks.

Batch API (v2.3 — flat 50% discount, stacks with prompt caching):

Tool	Description
`batch_submit`	Queue a batch of prompts for async (≤24h) processing at 50% cost. Accepts friendly `{system, user, model, max_tokens}` per prompt or raw `messages[]`.
`get_batch` / `list_batches`	Check batch state, pending / succeeded / failed counts, and accumulated cost.
`get_batch_results`	Fetch completed results (paginated, cache-first). Terminal batches are snapshotted locally so no xAI round-trip is needed.
`cancel_batch`	Cancel an in-flight batch before it starts processing.

Tool availability by mode

Not all tools are available in every mode:

Mode	Read-only	Write	Shell & tests	Media
Question	Yes	—	—	—
Plan	Yes	Yes	—	—
Agent	Yes	Yes	Yes	Yes

Platform Capabilities (v2.2)

Long-running services, per-app SQLite, and scoped sub-agents

avacli is no longer just a chat loop. It’s a self-hosted runtime for long-lived processes, per-app storage, and delegated agent work — all sandboxed under ~/.avacli/, all xAI-native. The three pillars landed together in v2.2:

Long-running services

Services can now be type:"process" for Python / Node / native daemons that must stay up (Discord bots, workers, relays). avacli supervises them with a cross-platform ProcessSupervisor (POSIX fork+setpgid, Windows CreateProcessW + Job Object), auto-restarts per your policy, and streams logs in real time.

Per-service workdir under ~/.avacli/services/<id>/ (configurable). Contains venv/, node_modules/, logs/, state/.
Auto dep install when deps.auto_install: true — first start runs python -m venv + pip install -r requirements.txt (or npm ci). Re-runs only when the hash of requirements.txt / package.json changes.
Restart policies: always, on_failure, never — with exponential backoff and a per-hour restart ceiling.
Vault-backed env: set env values to vault:token_name and avacli resolves them at spawn from the encrypted vault — plaintext secrets never touch services.config.
SSE logs: tail any service’s log buffer live via GET /api/services/:id/logs/stream.
Agent read-only access: tail_service_logs is available in Question mode for diagnosing without escalating.

Per-app SQLite

Every agent-generated app gets its own authenticated database at ~/.avacli/app_data/<slug>.db. Opened with WAL + foreign keys and a sqlite3_set_authorizer hook that denies ATTACH / DETACH and locks the app inside its own sandbox.

Auto-injected _sdk.js exposes window.avacli.db.query(sql, params), .execute, .schema, .main.query(view, params), and .ai.chat({messages, model}). Zero plumbing; the slug + agent_token are baked into the served JS.
Whitelisted main-DB views: apps can read articles_public, apps_directory, and my_app from the main database via /api/apps/:slug/main/query. Raw SQL against avacli.db is never exposed.
Size cap: global default apps_db_size_cap_mb=256 (range 16–16384) with per-app override. Over-quota writes return HTTP 507; reads still work so an app can drain its data.
Auth: Authorization: Bearer <agent_token> + same-origin referer. Master-key auth is bypassed for these routes.

Scoped sub-agents

The agent can delegate bounded work to a child xAI agent with spawn_subagent({goal, allowed_paths, allowed_tools, model?}). Each child runs on its own thread, with tool access gated by a ScopedToolExecutor and writes serialised by a LeaseManager.

Default-deny: subagents_max_depth=0 disables spawn entirely. Raising the slider in Settings to 1–16 enables progressively deeper nesting. No restart required.
Path & tool whitelists: allowed_paths (glob, canonicalised) and allowed_tools (intersected with Agent-mode set). run_shell and run_tests are denied unless explicitly whitelisted.
Write leases: before any mutating tool call, the executor does INSERT OR FAIL INTO agent_leases. Second writer gets {"error":"resource_locked","holder":"<task_id>"} — use wait_subagent(holder) or re-plan.
Lifecycle: pending → running → succeeded / failed / cancelled. wait_subagent blocks via condition_variable instead of polling SQLite.
xAI only, by design: no LLMClient indirection. Children share the same XAIClient as the root — only the model string varies.

Settings (Platform capabilities)

The Settings page in the web UI now has a “Platform capabilities” section with three controls:

Key	Default	Description
`services_workdir_root`	`~/.avacli/services`	Root directory for per-service workdirs (`venv`, `node_modules`, logs, state). Must be an absolute path with an existing parent.
`apps_db_size_cap_mb`	`256`	Global default size cap per app database (range 16–16384 MB). Per-app override via `app_db_set_cap`.
`subagents_max_depth`	`0` (disabled)	Maximum nesting depth for sub-agents (range 0–16). `0` disables `spawn_subagent` entirely.

All three are read fresh on every call. Changes take effect immediately — no server restart.

New HTTP endpoints

Method	Path	Description
GET	`/api/services/:id/status`	Live process details (alive, pid, uptime_s, restart_count, last_exit_code).
POST	`/api/services/:id/signal`	`{signal: "term" \| "kill" \| "restart"}`.
GET	`/api/services/:id/logs/stream`	SSE tail with 20s heartbeat.
POST	`/api/apps/:slug/db/{query,execute}`	Per-app SQLite query / mutation (requires `agent_token`).
GET	`/api/apps/:slug/db/{schema,export}`	Inspect schema or dump the app DB.
POST	`/api/apps/:slug/main/query`	Whitelisted view access to the main DB.
GET	`/apps/:slug/_sdk.js`	Auto-injected client SDK.
GET	`/api/tasks`	List sub-agent tasks (filter by `parent` / `status`).
POST	`/api/tasks` / `/api/tasks/:id/cancel`	Spawn or cancel a sub-agent task.

Cost Optimization (v2.3)

Prompt caching, sticky routing, reasoning preservation, and Batch API

v2.3 stops avacli from evicting its own prompt cache and layers xAI’s Batch API on top. Cached input on grok-4.20-* bills at ~10× less than fresh input (~4× on grok-4-1-fast); the Batch API adds a flat 50% discount that stacks multiplicatively — a cached batched request bills at roughly 5% of the sticker price. All five phases landed in a single release.

Byte-stable system prefix

Before v2.3, the agent re-wrote messages[0] on every tool turn with the session’s current TODOs, memory, and recent edits — which invalidated xAI’s KV cache every single turn (effective hit rate: ~0%). The system prompt is now split into two layers:

Static prefix — GROK_SYSTEM_PROMPT.md + workspace env. Set once per run() entry, byte-identical across turns, cached by xAI.
Dynamic context — TODOs, memory, recent edits, project notes. Injected as a dedicated user-role message immediately before the latest user turn, so new turns append to a byte-stable prefix. Stripped on write-back so persisted sessions don’t accumulate stale context refreshes.

Sticky routing — `conv_id` per session

A byte-stable prefix isn’t enough — xAI has a pool of servers, and without a routing hint your turn 2 can land on a cold box. Every avacli session now mints a conversation id once and reuses it forever:

Chat completions → x-grok-conv-id: conv-<16hex> HTTP header.
Responses API → prompt_cache_key body field with the same value.
Persisted in the session under conv_id and never regenerated — renaming a session doesn’t evict the cache. Old sessions get a fresh id on first load.
Sub-agents inherit the prefix — each child computes parent_conv_id + ":" + task_id, so siblings share routing locality and reuse each other’s cached sub-agent scaffolding.

Cache-hit observability

You can’t optimise what you can’t measure. Every response now reports cached-token counts across every surface:

Logs — [cache] prompt=8654 cached=7800 ratio=90% fires on every response (grep journalctl -u avacli | grep "\[cache\]").
SSE stream — /api/chat usage and done events include cached_tokens, reasoning_tokens, billable_prompt_tokens, cache_hit_rate.
Headless JSON — avacli --format json output carries the same breakdown.
Sub-agent results — result.usage on spawn_subagent / wait_subagent carries the full token breakdown per child.
Usage dashboard — the /usage page now attributes savings over time by rolling up cached_tokens and reasoning_tokens from usage_records.

Responses API chaining (`previous_response_id`)

When streaming from /v1/responses, xAI stores the server-side conversation state for 30 days. avacli now reuses it so follow-up turns send only the new input items — the server replays everything before it (including the encrypted reasoning trace) and the whole chain hits the same cache.

Automatic — AgentEngine tracks lastResponseId and lastSubmittedCount per session, slices apiMessages to the tail on chained calls, and skips instructions when the server already has the prefix.
Transparent fallback — if xAI reports the stored response expired or went missing, avacli transparently falls back to a full-history resend for the rest of the run, then resumes chaining from the next response.
Safe boundaries — chain state is cleared on MAX_TOOL_TURNS and when a caller clears history (use_history:false), so we never chain into a mismatched server state.
Panic switch — AgentEngine::setUsePreviousResponseId(false) disables chaining entirely. Default: on.

Reasoning preservation (chat completions)

Reasoning models (grok-4-reasoning, grok-4-fast-reasoning) stream a separate reasoning_content delta on the chat-completions path. avacli used to inline it into assistant content as <thought>…</thought>, which broke xAI’s reasoning cache on replay — the round-tripped messages no longer matched the originally-cached ones.

v2.3 routes raw reasoning_content into a dedicated Message::reasoning_content field, persists it per-message across save/load, and echoes it back on subsequent request bodies. The on-wire message is byte-identical to what the model produced, so the reasoning cache stays warm. The UI still receives <thought>…</thought> via the existing onStreamDelta path — transcript splitters are unchanged.

Batch API — 50% off, stacks with caching

xAI’s Batch API (/v1/messages/batches) returns a flat 50% discount on every token (input, cached, output, reasoning) and gives the service 24 hours to complete the work. Stacks multiplicatively with prompt caching — a cached batched request bills at 50% × ~10% = ~5% of the sticker price.

Submit a batch from the agent:

> Use batch_submit to summarise these 50 articles into tweets.
→ batch_submit({
    name: "article-tweets-2026-04",
    kind: "chat_completions",
    model: "grok-4.20-fast",
    prompts: [
      { system: "You write tweets.", user: "", max_tokens: 280 },
      ... 49 more ...
    ]
  })
← { "batch_id": "batch_abc123", "num_requests": 50, "state": "processing" }

The background BatchPoller thread (default 30s tick, kicked on submit) reconciles state and snapshots results to batches.results_json once terminal — subsequent get_batch_results calls serve straight from the local SQLite row with zero xAI round-trips.

Ideal workloads (all save ~5× on tokens):

RSS / news digest generation overnight.
Knowledge-base re-summarisation after a schema change.
Evaluation runs / regression harnesses.
Bulk image or prompt generation where latency isn’t user-facing.

Not for interactive chat — batches are explicitly async, typically complete in minutes to hours, and batch_submit is gated to tool-mode ≥ 2 (writes) so read-only sessions can’t accidentally queue paid jobs.

New HTTP endpoints (v2.3)

Method	Path	Description
GET	`/api/batches`	List batches with optional `owner` / `kind` / `state` filters.
POST	`/api/batches`	Create a batch. Accepts `{name, kind, requests:[…]}` to pre-populate, or just `{name, kind}` to populate via a later `POST /api/batches/:id/requests`.
GET	`/api/batches/:id`	Returns the merged local + live snapshot including cost, pending / succeeded / failed counts, and expiry.
GET	`/api/batches/:id/results?limit=&page_token=`	Paginated results (cache-first, falls through to xAI when the local snapshot is empty).
POST	`/api/batches/:id/cancel`	Cancel an in-flight batch.
POST	`/api/batches/:id/refresh`	Force a single-batch poll (also triggered automatically).
POST	`/api/batches/refresh`	Kick the global poller (reconcile all `processing` rows now).
DELETE	`/api/batches/:id`	Remove the local record (does not call xAI). Useful when you’ve archived results elsewhere.

All endpoints are JWT-guarded and owner-scoped. Batch state is persisted in the SQLite batches table (migration v7) with cost stored in 10⁻¹⁰ USD ticks (int64_t) to keep arithmetic free of float drift.

What to expect

On a typical tool-heavy session, prompt-cache hit rate climbs from ~0% on turn 1 to 70–90% from turn 2 onward, visible in the [cache] log line and the cache_hit_rate field on SSE done events. Combined with reasoning preservation and Responses-API chaining, interactive costs on grok-4.20-* drop by roughly 5–10× for long sessions — and by ~20× for any workload you can push into a batch.

CLI Reference

Quick command reference

Command	Description
`avacli serve`	Start the embedded web UI (default when run with no args).
`avacli chat "msg"`	One-shot chat — send a message, print the response, and exit.
`avacli --set-api-key KEY`	Save an xAI API key to `~/.avacli/config.json`.
`avacli --generate-master-key`	Create a random admin password for the web UI.
`avacli --workspace DIR serve`	Run with a specific working directory for file tools.
`avacli --list-models`	Show xAI models available to your API key.
`avacli --port 9090 serve`	Override the default web UI port (8080).
`avacli login` / `logout`	Manage CLI session auth against the master password.

Run avacli --help for the full list of flags and options.

Building from Source

Compile avacli yourself

Prerequisites

You need a C++17 compiler, CMake 3.16+, and a few system libraries:

sudo apt install build-essential cmake libcurl4-openssl-dev libssl-dev libjsoncpp-dev

Build

git clone https://github.com/iBerry420/acos.git
cd acos
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(nproc)

The resulting avacli binary is in build/. Copy it to /usr/local/bin/ or run it directly.

Supported platforms

Linux — x86_64 and ARM64 (primary target).
macOS — Intel and Apple Silicon (CI builds planned).
Windows — MSVC / MinGW (experimental, CI builds planned).

Contributing

Help improve avacli

avacli is MIT-licensed and welcomes contributions. The repository lives at github.com/iBerry420/acos.

Issues — report bugs or request features on the GitHub issue tracker.
Pull requests — fork the repo, create a branch, and open a PR against main.
Code style — follow the existing patterns in the codebase. Use clang-format if available.
Testing — make sure cmake --build . -j passes before submitting.

Contents

Getting Started

Install via APT (Debian / Ubuntu)

Install from tarball

Connect xAI

Start the web UI

CLI chat

Client Configuration

Web UI configuration

Direct xAI usage

Setting API keys

Workspace directory

Port configuration

Model selection

Agent Capabilities

File operations

Code analysis & project awareness

Streaming responses with reasoning

Tool execution pipeline

Available tools (50+)

Tool availability by mode

Platform Capabilities (v2.2)

Long-running services

Per-app SQLite

Scoped sub-agents

Settings (Platform capabilities)

New HTTP endpoints

Cost Optimization (v2.3)

Byte-stable system prefix

Sticky routing — conv_id per session

Cache-hit observability

Responses API chaining (previous_response_id)

Reasoning preservation (chat completions)

Batch API — 50% off, stacks with caching

New HTTP endpoints (v2.3)

What to expect

CLI Reference

Building from Source

Prerequisites

Build

Supported platforms

Contributing

Sticky routing — `conv_id` per session

Responses API chaining (`previous_response_id`)