Everything you need to install, configure, and use avacli — the free, open-source AI agent.
Add the signing key, source list, and install in one flow:
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://packages.avalynn.ai/key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/avalynn.gpg
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/avalynn.gpg] https://packages.avalynn.ai stable main" | sudo tee /etc/apt/sources.list.d/avalynn.list
sudo apt update && sudo apt install avacli
Download the tarball for your architecture from the Download page, then extract and run the installer:
tar xzf avacli-latest-linux-x86_64.tar.gz
cd avacli-*
sudo ./install.sh
Open-source avacli talks to the xAI API directly. Add your key once via CLI or the Settings page in the web UI:
avacli --set-api-key xai-YOUR_KEY
Stored under ~/.avacli/config.json. Alternatively set XAI_API_KEY in the environment.
Launch the local agent interface in your browser:
avacli serve
Default bind http://0.0.0.0:8080 (override with --host / --port). Run avacli --generate-master-key once to set the admin password.
Send a one-shot prompt without starting the web UI:
avacli chat "hello"
Sends the message and prints the agent response to stdout (requires API key).
Most settings can also be configured through the built-in web UI at http://localhost:8080 (or your chosen port). Open the Settings page in the sidebar to set your API key, choose a model, configure billing, and change the workspace directory — no CLI flags required.
avacli always uses the xAI API with your own key (CLI, environment, or Settings in the web UI). It does not proxy chat through the Avalynn platform or consume platform tokens.
Ways to provide your xAI key to avacli:
avacli --set-api-key xai-xxxxxxxxxxxx
Saves to ~/.avacli/config.json.
export XAI_API_KEY=xai-xxxxxxxxxxxx
Takes precedence when set.
Settings → API key
Persisted under ~/.avacli/ alongside other preferences.
Point the agent at a specific project directory for file operations and code analysis:
avacli --workspace /home/user/my-project serve
Short form: -d /path. The agent scopes file reads and writes to this directory.
Override the default web UI port (8080):
avacli serve --port 9090
The web UI will be available at http://localhost:9090 (or your host bind).
Choose which xAI model to use for completions:
avacli --model grok-4-1-reasoning serve
Use avacli --list-models to see all available models.
The agent can read, write, and search files within the configured workspace. It respects .gitignore patterns and applies size limits to prevent reading excessively large files.
When pointed at a project directory, the agent indexes the file tree and understands project structure, dependencies, and language-specific patterns. It can navigate codebases, find definitions, and suggest contextual edits.
Responses stream token-by-token in real time. When the model uses extended thinking, the reasoning trace is displayed in a collapsible block above the final answer so you can follow the agent's chain of thought.
The agent follows a tool-use loop: it receives your message, decides which tools to invoke (file read, file write, search, shell commands), executes them, and feeds the results back into the model for the next step. This continues until the task is complete or the model decides no more tool calls are needed.
The agent has access to the following tools, grouped by category. Availability depends on the active mode — see the mode table below.
Files, search, & memory (Question / Plan modes):
| Tool | Description |
|---|---|
read_file | Read the contents of a file (optionally by line range). |
write_file | Create a new file or overwrite an existing one. |
edit_file | Apply search-and-replace edits to a file. |
undo_edit | Revert the last edit made to a file. |
list_directory / glob_files | List or glob files in the workspace. |
search_files | Search file contents for a pattern. |
read_url / web_search / x_search | Fetch URLs or search the web / X in real time via xAI. |
add_memory / search_memory / forget_memory | Persistent session memory across turns. |
read_todos / create_update_todo | Manage the agent’s to-do list. |
read_project_notes | Read GROK_NOTES.md + session memory for context. |
ask_user | Prompt the user for clarification or confirmation. |
Shell, tests, & media (Agent mode):
| Tool | Description |
|---|---|
run_shell | Execute a shell command (non-interactive, configurable timeout). |
run_tests | Auto-detect and run the project’s test suite. |
generate_image / edit_image | Generate or edit images with Grok Imagine. |
generate_video / edit_video / extend_video | Generate, edit, or extend videos. |
create_tool / modify_tool / delete_tool | Tool Forge: build custom tools at runtime. |
research_api / setup_api / call_api | Research, configure, and invoke third-party HTTP APIs. |
vault_list / vault_store / vault_retrieve / vault_remove | Encrypted secrets vault for API keys and credentials. |
summarize_and_new_chat | Compress the session and start a fresh chat with continuity. |
Database, knowledge base, & apps:
| Tool | Description |
|---|---|
db_query | Read from the main agent database. |
save_article / search_articles | Persistent knowledge base. |
create_app / edit_app_file / list_apps | Generate static or dynamic web apps served from /apps/<slug>/. |
app_token | Return the agent_token for a given app (for manual SDK calls). |
app_db_execute | Seed or mutate data in an app’s private SQLite DB. |
app_db_set_cap | Set a per-app DB size cap (MB), overriding the global default. |
Services & sub-agents:
| Tool | Description |
|---|---|
create_service / manage_service / delete_service / list_services | Manage long-running services (tick-based or type:"process"). |
tail_service_logs | Fetch the most recent log lines from any service (Question mode, read-only). |
spawn_subagent | Delegate a bounded task to a child agent with allowed_paths and allowed_tools. |
wait_subagent | Block on a child task’s completion (with timeout). |
cancel_subagent / list_subagents | Cancel or list delegated tasks. |
Batch API (v2.3 — flat 50% discount, stacks with prompt caching):
| Tool | Description |
|---|---|
batch_submit | Queue a batch of prompts for async (≤24h) processing at 50% cost. Accepts friendly {system, user, model, max_tokens} per prompt or raw messages[]. |
get_batch / list_batches | Check batch state, pending / succeeded / failed counts, and accumulated cost. |
get_batch_results | Fetch completed results (paginated, cache-first). Terminal batches are snapshotted locally so no xAI round-trip is needed. |
cancel_batch | Cancel an in-flight batch before it starts processing. |
Not all tools are available in every mode:
| Mode | Read-only | Write | Shell & tests | Media |
|---|---|---|---|---|
| Question | Yes | — | — | — |
| Plan | Yes | Yes | — | — |
| Agent | Yes | Yes | Yes | Yes |
avacli is no longer just a chat loop. It’s a self-hosted runtime for long-lived processes, per-app storage, and delegated agent work — all sandboxed under ~/.avacli/, all xAI-native. The three pillars landed together in v2.2:
Services can now be type:"process" for Python / Node / native daemons that must stay up (Discord bots, workers, relays). avacli supervises them with a cross-platform ProcessSupervisor (POSIX fork+setpgid, Windows CreateProcessW + Job Object), auto-restarts per your policy, and streams logs in real time.
~/.avacli/services/<id>/ (configurable). Contains venv/, node_modules/, logs/, state/.deps.auto_install: true — first start runs python -m venv + pip install -r requirements.txt (or npm ci). Re-runs only when the hash of requirements.txt / package.json changes.always, on_failure, never — with exponential backoff and a per-hour restart ceiling.vault:token_name and avacli resolves them at spawn from the encrypted vault — plaintext secrets never touch services.config.GET /api/services/:id/logs/stream.tail_service_logs is available in Question mode for diagnosing without escalating.Every agent-generated app gets its own authenticated database at ~/.avacli/app_data/<slug>.db. Opened with WAL + foreign keys and a sqlite3_set_authorizer hook that denies ATTACH / DETACH and locks the app inside its own sandbox.
_sdk.js exposes window.avacli.db.query(sql, params), .execute, .schema, .main.query(view, params), and .ai.chat({messages, model}). Zero plumbing; the slug + agent_token are baked into the served JS.articles_public, apps_directory, and my_app from the main database via /api/apps/:slug/main/query. Raw SQL against avacli.db is never exposed.apps_db_size_cap_mb=256 (range 16–16384) with per-app override. Over-quota writes return HTTP 507; reads still work so an app can drain its data.Authorization: Bearer <agent_token> + same-origin referer. Master-key auth is bypassed for these routes.The agent can delegate bounded work to a child xAI agent with spawn_subagent({goal, allowed_paths, allowed_tools, model?}). Each child runs on its own thread, with tool access gated by a ScopedToolExecutor and writes serialised by a LeaseManager.
subagents_max_depth=0 disables spawn entirely. Raising the slider in Settings to 1–16 enables progressively deeper nesting. No restart required.allowed_paths (glob, canonicalised) and allowed_tools (intersected with Agent-mode set). run_shell and run_tests are denied unless explicitly whitelisted.INSERT OR FAIL INTO agent_leases. Second writer gets {"error":"resource_locked","holder":"<task_id>"} — use wait_subagent(holder) or re-plan.pending → running → succeeded / failed / cancelled. wait_subagent blocks via condition_variable instead of polling SQLite.LLMClient indirection. Children share the same XAIClient as the root — only the model string varies.The Settings page in the web UI now has a “Platform capabilities” section with three controls:
| Key | Default | Description |
|---|---|---|
services_workdir_root | ~/.avacli/services | Root directory for per-service workdirs (venv, node_modules, logs, state). Must be an absolute path with an existing parent. |
apps_db_size_cap_mb | 256 | Global default size cap per app database (range 16–16384 MB). Per-app override via app_db_set_cap. |
subagents_max_depth | 0 (disabled) | Maximum nesting depth for sub-agents (range 0–16). 0 disables spawn_subagent entirely. |
All three are read fresh on every call. Changes take effect immediately — no server restart.
| Method | Path | Description |
|---|---|---|
| GET | /api/services/:id/status | Live process details (alive, pid, uptime_s, restart_count, last_exit_code). |
| POST | /api/services/:id/signal | {signal: "term" | "kill" | "restart"}. |
| GET | /api/services/:id/logs/stream | SSE tail with 20s heartbeat. |
| POST | /api/apps/:slug/db/{query,execute} | Per-app SQLite query / mutation (requires agent_token). |
| GET | /api/apps/:slug/db/{schema,export} | Inspect schema or dump the app DB. |
| POST | /api/apps/:slug/main/query | Whitelisted view access to the main DB. |
| GET | /apps/:slug/_sdk.js | Auto-injected client SDK. |
| GET | /api/tasks | List sub-agent tasks (filter by parent / status). |
| POST | /api/tasks / /api/tasks/:id/cancel | Spawn or cancel a sub-agent task. |
v2.3 stops avacli from evicting its own prompt cache and layers xAI’s Batch API on top. Cached input on grok-4.20-* bills at ~10× less than fresh input (~4× on grok-4-1-fast); the Batch API adds a flat 50% discount that stacks multiplicatively — a cached batched request bills at roughly 5% of the sticker price. All five phases landed in a single release.
Before v2.3, the agent re-wrote messages[0] on every tool turn with the session’s current TODOs, memory, and recent edits — which invalidated xAI’s KV cache every single turn (effective hit rate: ~0%). The system prompt is now split into two layers:
GROK_SYSTEM_PROMPT.md + workspace env. Set once per run() entry, byte-identical across turns, cached by xAI.conv_id per sessionA byte-stable prefix isn’t enough — xAI has a pool of servers, and without a routing hint your turn 2 can land on a cold box. Every avacli session now mints a conversation id once and reuses it forever:
x-grok-conv-id: conv-<16hex> HTTP header.prompt_cache_key body field with the same value.conv_id and never regenerated — renaming a session doesn’t evict the cache. Old sessions get a fresh id on first load.parent_conv_id + ":" + task_id, so siblings share routing locality and reuse each other’s cached sub-agent scaffolding.You can’t optimise what you can’t measure. Every response now reports cached-token counts across every surface:
[cache] prompt=8654 cached=7800 ratio=90% fires on every response (grep journalctl -u avacli | grep "\[cache\]")./api/chat usage and done events include cached_tokens, reasoning_tokens, billable_prompt_tokens, cache_hit_rate.avacli --format json output carries the same breakdown.result.usage on spawn_subagent / wait_subagent carries the full token breakdown per child./usage page now attributes savings over time by rolling up cached_tokens and reasoning_tokens from usage_records.previous_response_id)When streaming from /v1/responses, xAI stores the server-side conversation state for 30 days. avacli now reuses it so follow-up turns send only the new input items — the server replays everything before it (including the encrypted reasoning trace) and the whole chain hits the same cache.
AgentEngine tracks lastResponseId and lastSubmittedCount per session, slices apiMessages to the tail on chained calls, and skips instructions when the server already has the prefix.MAX_TOOL_TURNS and when a caller clears history (use_history:false), so we never chain into a mismatched server state.AgentEngine::setUsePreviousResponseId(false) disables chaining entirely. Default: on.Reasoning models (grok-4-reasoning, grok-4-fast-reasoning) stream a separate reasoning_content delta on the chat-completions path. avacli used to inline it into assistant content as <thought>…</thought>, which broke xAI’s reasoning cache on replay — the round-tripped messages no longer matched the originally-cached ones.
v2.3 routes raw reasoning_content into a dedicated Message::reasoning_content field, persists it per-message across save/load, and echoes it back on subsequent request bodies. The on-wire message is byte-identical to what the model produced, so the reasoning cache stays warm. The UI still receives <thought>…</thought> via the existing onStreamDelta path — transcript splitters are unchanged.
xAI’s Batch API (/v1/messages/batches) returns a flat 50% discount on every token (input, cached, output, reasoning) and gives the service 24 hours to complete the work. Stacks multiplicatively with prompt caching — a cached batched request bills at 50% × ~10% = ~5% of the sticker price.
Submit a batch from the agent:
> Use batch_submit to summarise these 50 articles into tweets.
→ batch_submit({
name: "article-tweets-2026-04",
kind: "chat_completions",
model: "grok-4.20-fast",
prompts: [
{ system: "You write tweets.", user: "", max_tokens: 280 },
... 49 more ...
]
})
← { "batch_id": "batch_abc123", "num_requests": 50, "state": "processing" }
The background BatchPoller thread (default 30s tick, kicked on submit) reconciles state and snapshots results to batches.results_json once terminal — subsequent get_batch_results calls serve straight from the local SQLite row with zero xAI round-trips.
Ideal workloads (all save ~5× on tokens):
Not for interactive chat — batches are explicitly async, typically complete in minutes to hours, and batch_submit is gated to tool-mode ≥ 2 (writes) so read-only sessions can’t accidentally queue paid jobs.
| Method | Path | Description |
|---|---|---|
| GET | /api/batches | List batches with optional owner / kind / state filters. |
| POST | /api/batches | Create a batch. Accepts {name, kind, requests:[…]} to pre-populate, or just {name, kind} to populate via a later POST /api/batches/:id/requests. |
| GET | /api/batches/:id | Returns the merged local + live snapshot including cost, pending / succeeded / failed counts, and expiry. |
| GET | /api/batches/:id/results?limit=&page_token= | Paginated results (cache-first, falls through to xAI when the local snapshot is empty). |
| POST | /api/batches/:id/cancel | Cancel an in-flight batch. |
| POST | /api/batches/:id/refresh | Force a single-batch poll (also triggered automatically). |
| POST | /api/batches/refresh | Kick the global poller (reconcile all processing rows now). |
| DELETE | /api/batches/:id | Remove the local record (does not call xAI). Useful when you’ve archived results elsewhere. |
All endpoints are JWT-guarded and owner-scoped. Batch state is persisted in the SQLite batches table (migration v7) with cost stored in 10−10 USD ticks (int64_t) to keep arithmetic free of float drift.
On a typical tool-heavy session, prompt-cache hit rate climbs from ~0% on turn 1 to 70–90% from turn 2 onward, visible in the [cache] log line and the cache_hit_rate field on SSE done events. Combined with reasoning preservation and Responses-API chaining, interactive costs on grok-4.20-* drop by roughly 5–10× for long sessions — and by ~20× for any workload you can push into a batch.
You need a C++17 compiler, CMake 3.16+, and a few system libraries:
sudo apt install build-essential cmake libcurl4-openssl-dev libssl-dev libjsoncpp-dev
git clone https://github.com/iBerry420/acos.git
cd acos
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(nproc)
The resulting avacli binary is in build/. Copy it to /usr/local/bin/ or run it directly.
avacli is MIT-licensed and welcomes contributions. The repository lives at github.com/iBerry420/acos.
main.clang-format if available.cmake --build . -j passes before submitting.