arbstr: LLM Routing with Bitcoin Settlement

AI inference has a pricing problem and an economic identity problem. arbstr is a three-part ecosystem — routing proxy, treasury service, and deployment layer — that routes each LLM request to the cheapest qualified provider and settles costs in Bitcoin over Lightning.

Why I Built This

LLM inference pricing is wildly inconsistent. Multiple providers offer the same models at different rates priced in satoshis — Provider A charges 8 sats per 1k tokens, Provider B charges 12 sats, Provider C charges 10 sats. On Routstr, a decentralized LLM marketplace, those spreads are real, and exploiting them is valuable even with a basic routing algorithm.

The deeper problem is that AI agents have no native economic identity. When an agent makes thousands of API calls a day, there is no principled way to meter its spend, enforce a daily budget, or audit what it paid for. Billing is either absent or shunted onto human-managed API keys, which have no per-agent granularity.

arbstr was built to solve both problems in one deployable stack: route each request to the cheapest qualified provider automatically, while metering spend against isolated agent sub-accounts backed by Bitcoin. No tokens, no SaaS payment rails — just sats and Lightning.

How It Works

A request arrives at arbstr core's OpenAI-compatible proxy, the vault reserves estimated cost from the agent's balance, the router selects the cheapest qualified provider, the response streams back, and the vault settles the actual cost and refunds any overage — all within a single request lifecycle.

Your App ──> arbstr core ──> arbstr-vault (debit buyer)
                 │
                 ├──> mesh-llm node (local / free)
                 ├──> Routstr (marketplace)
                 └──> any OpenAI-compatible endpoint
                 │
            arbstr-vault (credit) ──> Lightning payout

The ecosystem spans three repositories, each with a distinct role:

arbstr core is the routing engine — a Rust binary (axum/Tokio) that exposes an OpenAI-compatible API at /v1/chat/completions. A policy engine determines which routing rule applies — either via the explicit X-Arbstr-Policy header or keyword heuristics that scan message content. Providers are filtered by allowed models and cost constraints, and the cheapest remaining provider is selected. Circuit breakers (Closed/Open/Half-Open) provide per-provider resilience with automatic recovery probing. An intelligent complexity tier system routes simple requests to local or free providers and escalates complex ones to frontier models. Mock mode allows local testing without real API calls.

arbstr-vault is the treasury service — a Fastify/TypeScript application backed by SQLite (Drizzle ORM in WAL mode). Each agent gets an isolated sub-account with a bearer token for authentication. A reserve/settle/release billing pattern makes async wallet calls crash-safe: before routing, vault reserves the estimated cost ceiling; after the response streams, vault settles the actual cost and releases the overage. The policy engine enforces per-agent maximum transaction size and daily spend limits, and fails closed on error — over-limit transactions enter a human-approval queue with configurable timeout. Wallet backends are pluggable: simulated (development), Lightning via LND gRPC, Cashu via a self-hosted Nutshell mint, or auto dual-rail routing based on amount. Every ledger update is atomic with an append-only audit log.

arbstr-node is the deployment layer — a Docker Compose repo with no compiled code of its own. One docker compose up -d starts four services: arbstr core (port 8080), arbstr-vault (port 3010), LND (gRPC 10009), and Nutshell Cashu mint (port 3338). Configuration is split between .env for secrets and config.toml for provider routing rules and policies. The repo uses submodules to pull in the core and vault source.

What I Learned

Building a proxy that is genuinely drop-in compatible with the OpenAI API is harder than it looks. The protocol has subtle behavior around streaming, error responses, and token counting that existing clients depend on. Matching those semantics exactly — including server-sent event passthrough and correlation ID propagation — was more work than the routing logic itself.

The reserve/settle/release billing pattern in arbstr-vault was the key architectural decision for correctness. Async wallet calls can fail after the inference response has already streamed. Without reserve-first/settle-after, a crash between request completion and billing would silently undercharge agents. The append-only ledger, atomic with every policy update, makes the audit trail reliable even under partial failures.

Even without sophisticated ML, just picking the cheapest provider at request time saves 20–30 percent on inference costs. The price spreads in decentralized marketplaces are real.

The honest lesson from the vault side: holding wallet keys and connecting to real payment rails is a fundamentally different risk category than building a routing proxy. The arbstr-vault README's CAUTION is not boilerplate — the system is experimental, has not been audited, and is not hardened for production. The architecture is correct in principle, but the gap between "correct in principle" and "safe with real funds" is large. That gap is the next phase of work.

Note: arbstr is early-stage software. arbstr-vault is experimental, not hardened, not audited, and not production-ready. Do not use with real money.


Economic infrastructure for AI agents, settled in sats.