POST /v1/chat/completions · json_mode · ctx 32k · region us
Every LLM request, routed and accountable.
Tell is a Rust-based gateway between your product and every LLM provider. It caches, routes, redacts, and audits, at sixty thousand requests per second, with sub-millisecond overhead.
No SDK rewrite. No vendor lock-in. No quiet model swaps that hit your customers before they hit your dashboard.
- Throughput
- 60k+ rps
- Overhead
- < 1 ms
- Mean response
- 4.3 ms
- Failures @ 10k conc.
- 0
When this matters
If you ship LLM features, you've already met these problems.
Tell is for teams whose product talks to a model on every meaningful click, and whose ops, finance, and legal teams are starting to notice.
You picked a provider, and now you live there
One SDK, one bill, one failure mode. The model that beat yours on price last week sits on the other side of a rewrite you keep deferring.
Spend is creeping and nobody owns it
The same prompt is generating the same answer for the tenth time today. Two teams are calling GPT-5 for tasks 4o-mini would solve. There is no budget by team, no cap by project.
Outages are silent until they are loud
A model gets deprecated. A fingerprint flips overnight. A region quietly throttles. You find out the same way your customers do: through the support inbox.
Compliance keeps asking what left the building
PII rides along in the prompts. Secrets land in the logs. There is no audit trail that survives a regulator, and no redaction layer that survives a code review.
How it works
From your client to a provider, through one accountable hop.
Tell sits on the wire. Every prompt your product makes, every response a model returns, passes through the same place, and gets the same treatment.
Point at the URL
Tell speaks the OpenAI API. Swap one base URL in your existing client; keys, types, and streaming all keep working. No SDK rewrite, no new abstractions to learn.
Solve before you spend
A constraint solver filters providers by region, context length, cost ceiling, and JSON mode. A multi-objective bandit then scores the survivors on cost, latency, and quality.
Cache, redact, route
A semantic cache check answers similar prompts in under a millisecond. PII and secrets are redacted at the edge. The chosen provider streams the response back in coherent chunks.
Log every byte
Prompt, params, route decision, tool traces, fingerprints, confidence map, all attached to one trace. Sealed originals; redacted day-to-day logs; audit-ready exports on demand.
What ships
Cheaper traffic. Calmer outages. One audit trail.
Three things drop into place the day Tell goes live in front of your stack.
- 01
A router that picks the cheapest model that still works
Constraints first, scoring second. Fewer wasted GPT-5 calls on tasks 4o-mini would handle; fewer truncated 4o-mini calls on prompts that actually needed the bigger context.
- 02
A semantic cache that pays for itself
Vector lookups on incoming prompts. Forty to sixty percent of traffic stops short of a provider; the answer was already on the shelf, ten characters of paraphrase away.
- 03
A single audit trail across every provider
Prompts, params, route decisions, fingerprints, redactions, all in one trace. The forensics bundle attaches itself to the incident, not to the engineer who happens to be on call.
Live traffic · last 60 seconds3.7M reqs58% cached$214 spend“summarise this 12k-token transcript”
anthropic / sonnet-4312 ms“classify ticket sentiment”
↳ served from semantic cache
cache · 0.94 sim0.8 ms“extract invoice fields → JSON”
↳ json repair · 1 fixup
openai / gpt-4.1198 ms“draft customer email re: refund”
↳ PII redacted · 2 spans
bedrock / haiku-4267 ms
POST /v1/chat/completions · json_mode · ctx 32k · region us
Why Tell
A gateway is the right place for this.
The same problems show up in every LLM stack. Solving them once, on the wire, beats solving them four times in four apps.
01
It's a gateway, not a wrapper.
It lives on the wire. Every request, every retry, every failover passes through the same place, so policy, cost, and audit are written once, not re-implemented per app.
02
It's built for the day a provider changes.
Model fingerprints watched, behavioural contracts enforced, shadow traffic compared. When a provider quietly swaps a checkpoint, you find out before your customers do.
What Tell isn't
Three things people assume, and the actual answer.
The category is full of half-fits. Here's what makes Tell structurally different.
Not a thin proxy
It routes on cost, latency, and quality. It repairs JSON output to schema. It hedges tail latency across providers and cancels on first success. A reverse proxy doesn't.
Not an SDK abstraction
Your code keeps using the OpenAI client. Tell is the URL it points at. No new types, no rewrites, no second SDK to lock yourself into.
Not a separate observability tool
Request, response, prompt diff, route decision, cost, confidence: same trace. One pane, not three integrations and a Looker dashboard.
Common questions
What teams ask before they put it on the wire.
01Do we have to rewrite our app?
No. Tell speaks the OpenAI API. Point your existing OpenAI client at Tell's URL: keys, types, streaming, tool calls all keep working. The migration is a config change, not a project.
02Where does it run?
Hosted, per-region, with a global control plane and per-region data planes. Or BYOC inside your VPC, where the data plane stays in your residency zone and the control plane stays out.
03What happens when a provider goes down?
Smart fallback. Tell holds the chosen model, hedges to a backup if tail latency spikes, and routes around outages without changing the request your client sent. The first you'll hear about it is the dashboard.
04Is our data safe?
PII and secrets are redacted at the edge before egress. Day-to-day logs are redacted; sealed originals sit in immutable storage with break-glass controls and per-tenant KMS keys.
05Can we run our own models behind it?
Yes. Self-hosted endpoints (vLLM, TGI, in-house Triton, anything that speaks OpenAI or a known schema) join the routing pool alongside hosted providers; same constraints, same scoring, same audit.
Pricing
Pay for completions, not seats.
Cached calls are free. Routed calls are billed at provider cost plus a thin margin. The incentive sits on the right side.
- One workspace · all providers
- Semantic cache + smart routing
- Direct line to the engineering team
- Pricing fixed at beta rate after launch
- Unlimited prompts and providers
- Per-team budgets and virtual keys
- PII redaction · audit log exports
- SOC2-ready data handling
- Email + Slack support
- Self-hosted or VPC-isolated
- Per-tenant KMS keys (BYOK)
- SSO, SAML, SCIM
- Dedicated solutions engineer
- Procurement-friendly contracts
Plan 01
Pilot
Free · 90 days
For closed-beta partners
Drop Tell in front of one product surface. All features unlocked, weekly check-ins, and pricing locked in for the first year.
Plan 02
Team
From $499/mo
For shipping LLM teams
Hosted, multi-region, full feature set. Usage-based billing on completions that actually leave the gateway; cached calls are free.
Plan 03
Enterprise
Custom
BYOC data plane
Single-tenant deployment in your VPC. BYOK encryption, dedicated SRE, hard multi-tenancy across business units.
§ 03 · Engagement intake
01 / Start a brief
Talk to the people who’ll do the work.
We staff small and senior, scope by phase, and end on a written deliverable. We don’t sell decks or hours.
If we’re not the right team for the job, we say so on the first call. The bar is production, not pitch.
02 / Where to find us
Montreal, Quebec
Satellite office- Lat / Lng
- 45.5089°N · 73.5542°W
- Local
- —:— EST · UTC−05