About a year ago I was setting up an LLM gateway for a small team. Route requests to a few providers, enforce rate limits, keep API keys out of application code, and make sure no prompt content ever leaves our infrastructure.
That last requirement turned out to be the hard part.
LiteLLM is the obvious choice - but it’s a Python monolith with a massive dependency tree, slow startup, and a feature surface we didn’t need. Portkey and Helicone are polished, but cloud-first. Your prompts transit their infrastructure. For GDPR, that’s a non-starter.
For a long time we ran without a proxy. Services in Kubernetes hit vLLM directly - network policies were the only access control. It worked until we needed to know which team was burning through GPU hours. At that point we had two options: bolt observability onto every service, or put a proxy in front that handles auth, tracking, and routing in one place. We chose the proxy.
graph LR
A1[Service A] --> B[VoidLLM Proxy]
A2[Service B] --> B
A3[Service C] --> B
A4[Cronjob] --> B
B -->|Route by model| C{Model Registry}
C -->|anthropic| D[Anthropic API]
C -->|openai| E[OpenAI API]
C -->|vllm| F[Self-Hosted vLLM]
B -->|"Metadata only"| G[(Usage DB)]
style B fill:#8b5cf6,stroke:#6366f1,color:#fff
style G fill:#12121a,stroke:#8b5cf6,color:#e2e8f0 The proxy reads the model field from the request, resolves it through the registry, and streams bytes to the right upstream. No content touches disk. Usage events track who, which model, how many tokens - nothing else.
ℹZero-Knowledge by design
There’s no “disable content logging” toggle - because there’s no content logging code to disable. Read more in Zero-Knowledge by Architecture, Not by Policy.
Go + Fiber. Sub-2ms proxy overhead. The binary embeds the entire admin UI - no separate frontend, no Node.js. Copy one binary, write a config, start it.
SQLite default, PostgreSQL optional. The Community tier is free and stays free: RBAC, rate limiting, usage tracking, load balancing, MCP gateway, automatic failover, and a full admin UI.
Paid tiers add SSO, audit logs, OpenTelemetry, and multi-instance support. Flat pricing - no per-request or per-seat fees.
The GitHub repo is where it happens. Star it if you’re interested.
Most LLM proxies log your prompts. The EU AI Act makes that a compliance problem. Here's how VoidLLM's architecture simplifies things.
Code Mode is great, but you shouldn't need an LLM proxy to use it. VoidMCP is a separate Go binary that brings Code Mode and the MCP gateway to anyone who just wants their IDE to talk to multiple MCP servers.
How VoidLLM enforces privacy at the code level - no content logging, no opt-out, no exceptions.
VoidLLM's Code Mode lets AI agents orchestrate multiple MCP tool calls in a single WASM-sandboxed JavaScript execution. No round-trips, no latency penalty.