architecture open-source privacy

Why We Built VoidLLM

· 2 min read

About a year ago I was setting up an LLM gateway for a small team. Route requests to a few providers, enforce rate limits, keep API keys out of application code, and make sure no prompt content ever leaves our infrastructure.

That last requirement turned out to be the hard part.

The problem

LiteLLM is the obvious choice - but it’s a Python monolith with a massive dependency tree, slow startup, and a feature surface we didn’t need. Portkey and Helicone are polished, but cloud-first. Your prompts transit their infrastructure. For GDPR, that’s a non-starter.

For a long time we ran without a proxy. Services in Kubernetes hit vLLM directly - network policies were the only access control. It worked until we needed to know which team was burning through GPU hours. At that point we had two options: bolt observability onto every service, or put a proxy in front that handles auth, tracking, and routing in one place. We chose the proxy.

How VoidLLM works

graph LR
  A1[Service A] --> B[VoidLLM Proxy]
  A2[Service B] --> B
  A3[Service C] --> B
  A4[Cronjob] --> B
  B -->|Route by model| C{Model Registry}
  C -->|anthropic| D[Anthropic API]
  C -->|openai| E[OpenAI API]
  C -->|vllm| F[Self-Hosted vLLM]
  B -->|"Metadata only"| G[(Usage DB)]
  style B fill:#8b5cf6,stroke:#6366f1,color:#fff
  style G fill:#12121a,stroke:#8b5cf6,color:#e2e8f0
VoidLLM sits between your apps and LLM providers. Content flows through - only metadata is stored.

The proxy reads the model field from the request, resolves it through the registry, and streams bytes to the right upstream. No content touches disk. Usage events track who, which model, how many tokens - nothing else.

Zero-Knowledge by design

There’s no “disable content logging” toggle - because there’s no content logging code to disable. Read more in Zero-Knowledge by Architecture, Not by Policy.

What we built

Go + Fiber. Sub-2ms proxy overhead. The binary embeds the entire admin UI - no separate frontend, no Node.js. Copy one binary, write a config, start it.

SQLite default, PostgreSQL optional. The Community tier is free and stays free: RBAC, rate limiting, usage tracking, load balancing, MCP gateway, automatic failover, and a full admin UI.

Paid tiers add SSO, audit logs, OpenTelemetry, and multi-instance support. Flat pricing - no per-request or per-seat fees.

What’s next

The GitHub repo is where it happens. Star it if you’re interested.

Related posts