arbstr¶

GitHub Repo | Updated: February 2026

An intelligent LLM routing proxy that exploits price differences in the Routstr decentralized marketplace. Point your app at arbstr and it automatically sends each request to the cheapest provider, saving you money without changing a line of code.

Tech Stack: - Rust - OpenAI-compatible API (drop-in proxy) - Routstr marketplace integration - Policy engine with keyword matching

Why I Built This¶

LLM inference pricing is wildly inconsistent. The same model from different providers can vary by 50 percent or more. On Routstr, a decentralized LLM marketplace, multiple providers offer identical models at different rates priced in satoshis. Provider A charges 8 sats per 1k tokens. Provider B charges 12 sats. Provider C charges 10 sats.

If you're making thousands of API calls a day, those price differences add up fast. But manually checking rates and switching providers is tedious. You'd have to monitor the marketplace, compare prices, update your API configuration, and do it all over again when rates change.

I wanted a proxy that sits between my applications and the marketplace, automatically routing each request to the cheapest available provider. No manual configuration. No vendor lock-in. Just point your existing OpenAI SDK at the proxy and let it optimize costs in real time.

The challenge was building a routing engine that respects quality and policy constraints while still optimizing for price. You can't just pick the absolute cheapest option every time. Some prompts need a specific model. Some tasks have a cost ceiling. Some requests require providers with certain capabilities.

How It Works¶

arbstr is an OpenAI-compatible HTTP proxy. Your application sends a standard /v1/chat/completions request, and arbstr routes it to the best provider based on cost and policy.

Request Flow:

Application → arbstr Proxy → Policy Engine → Router
                                               ↓
                         Routstr Provider (cheapest match)

When a request arrives, the policy engine first determines which policy applies. You can explicitly set a policy via the X-Arbstr-Policy header, or arbstr can match policies automatically by scanning message content for keywords. For example, if the message contains "code", "function", or "debug", it might match a "code_generation" policy.

Once the policy is matched, the router filters providers by allowed models and cost constraints from that policy. Then it calculates the total cost for each remaining provider (input rate + output rate + base fee) and selects the cheapest.

The request is forwarded to the selected provider with full streaming support. Server-sent events pass through transparently. The client gets the exact response format it expects from the OpenAI API. Every request gets a correlation ID (UUID) for tracing through the system.

In mock mode, arbstr simulates providers locally for testing without making real API calls. In production, it talks directly to Routstr marketplace providers.

What I Learned¶

Building a proxy that's truly drop-in compatible is harder than it looks. OpenAI's API has subtle behavior around streaming, error responses, and token counting that clients depend on. I had to match those semantics exactly or risk breaking existing applications.

The policy engine went through several iterations. I started with a simple "lowest cost" strategy, but realized that doesn't work for real-world usage. Some tasks need Claude for reasoning. Some tasks are fine with GPT-4o-mini. The keyword-based heuristics were the compromise: automatic policy matching without requiring clients to change how they call the API.

Rust's async ecosystem (Tokio + axum) made building the HTTP proxy straightforward. Streaming SSE responses required careful handling of backpressure and error propagation, but axum's streaming abstractions handled most of the complexity.

The cost calculation logic is simple now, but modeling future arbitrage opportunities is tricky. Routstr providers can change rates dynamically. Temporal arbitrage (predicting when rates will drop) requires historical data and learning, which is the next phase.

What surprised me most: even without sophisticated ML, just picking the cheapest provider at request time saves 20-30 percent on inference costs. The price spreads in decentralized marketplaces are real, and exploiting them is valuable even with a basic routing algorithm.

Links¶

Let the market decide. Let the router optimize.