Tell/A product from GroupLabsLLM gateway

Every LLM request, routed and accountable.

Tell is a Rust-based gateway between your product and every LLM provider. It caches, routes, redacts, and audits, at sixty thousand requests per second, with sub-millisecond overhead.

No SDK rewrite. No vendor lock-in. No quiet model swaps that hit your customers before they hit your dashboard.

Get early access See how it works →Released · v1.0

tell.gateway / v1Live

rps62.3k

p504.3 ms

cache58%

err0.00%

↘request received·· ms

providers4/4 feasible

openaigpt-5us-east47%

anthropicsonnet-4us-east28%

bedrockhaiku-4us-west16%

vertexgemini-2eu-west9%

semantic cache823 keyshit 58%

decisionstep 1/4

POST /v1/chat/completions · json_mode · ctx 32k · region us

Throughput: 60k+ rps
Overhead: < 1 ms
Mean response: 4.3 ms
Failures @ 10k conc.: 0

When this matters

If you ship LLM features, you've already met these problems.

Tell is for teams whose product talks to a model on every meaningful click, and whose ops, finance, and legal teams are starting to notice.

You picked a provider, and now you live there

One SDK, one bill, one failure mode. The model that beat yours on price last week sits on the other side of a rewrite you keep deferring.

Spend is creeping and nobody owns it

The same prompt is generating the same answer for the tenth time today. Two teams are calling GPT-5 for tasks 4o-mini would solve. There is no budget by team, no cap by project.

Outages are silent until they are loud

A model gets deprecated. A fingerprint flips overnight. A region quietly throttles. You find out the same way your customers do: through the support inbox.

Compliance keeps asking what left the building

PII rides along in the prompts. Secrets land in the logs. There is no audit trail that survives a regulator, and no redaction layer that survives a code review.

How it works

From your client to a provider, through one accountable hop.

Tell sits on the wire. Every prompt your product makes, every response a model returns, passes through the same place, and gets the same treatment.

01Step

Point at the URL

Tell speaks the OpenAI API. Swap one base URL in your existing client; keys, types, and streaming all keep working. No SDK rewrite, no new abstractions to learn.

02Step

Solve before you spend

A constraint solver filters providers by region, context length, cost ceiling, and JSON mode. A multi-objective bandit then scores the survivors on cost, latency, and quality.

03Step

Cache, redact, route

A semantic cache check answers similar prompts in under a millisecond. PII and secrets are redacted at the edge. The chosen provider streams the response back in coherent chunks.

04Step

Log every byte

Prompt, params, route decision, tool traces, fingerprints, confidence map, all attached to one trace. Sealed originals; redacted day-to-day logs; audit-ready exports on demand.

What ships

Cheaper traffic. Calmer outages. One audit trail.

Three things drop into place the day Tell goes live in front of your stack.

01
A router that picks the cheapest model that still works
Constraints first, scoring second. Fewer wasted GPT-5 calls on tasks 4o-mini would handle; fewer truncated 4o-mini calls on prompts that actually needed the bigger context.
02
A semantic cache that pays for itself
Vector lookups on incoming prompts. Forty to sixty percent of traffic stops short of a provider; the answer was already on the shelf, ten characters of paraphrase away.
03
A single audit trail across every provider
Prompts, params, route decisions, fingerprints, redactions, all in one trace. The forensics bundle attaches itself to the incident, not to the engineer who happens to be on call.
Live traffic · last 60 seconds
3.7M reqs58% cached$214 spend
- “summarise this 12k-token transcript”
  anthropic / sonnet-4
  312 ms
- “classify ticket sentiment”
  ↳ served from semantic cache
  cache · 0.94 sim
  0.8 ms
- “extract invoice fields → JSON”
  ↳ json repair · 1 fixup
  openai / gpt-4.1
  198 ms
- “draft customer email re: refund”
  ↳ PII redacted · 2 spans
  bedrock / haiku-4
  267 ms

tell.gateway / v1Live

rps62.3k

p504.3 ms

cache58%

err0.00%

↘request received·· ms

providers4/4 feasible

openaigpt-5us-east47%

anthropicsonnet-4us-east28%

bedrockhaiku-4us-west16%

vertexgemini-2eu-west9%

semantic cache823 keyshit 58%

decisionstep 1/4

POST /v1/chat/completions · json_mode · ctx 32k · region us

Why Tell

A gateway is the right place for this.

The same problems show up in every LLM stack. Solving them once, on the wire, beats solving them four times in four apps.

It's a gateway, not a wrapper.

It lives on the wire. Every request, every retry, every failover passes through the same place, so policy, cost, and audit are written once, not re-implemented per app.

It's built for the day a provider changes.

Model fingerprints watched, behavioural contracts enforced, shadow traffic compared. When a provider quietly swaps a checkpoint, you find out before your customers do.

What Tell isn't

Three things people assume, and the actual answer.

The category is full of half-fits. Here's what makes Tell structurally different.

Not a thin proxy

It routes on cost, latency, and quality. It repairs JSON output to schema. It hedges tail latency across providers and cancels on first success. A reverse proxy doesn't.

Not an SDK abstraction

Your code keeps using the OpenAI client. Tell is the URL it points at. No new types, no rewrites, no second SDK to lock yourself into.

Not a separate observability tool

Request, response, prompt diff, route decision, cost, confidence: same trace. One pane, not three integrations and a Looker dashboard.

Common questions

What teams ask before they put it on the wire.

01Do we have to rewrite our app?

No. Tell speaks the OpenAI API. Point your existing OpenAI client at Tell's URL: keys, types, streaming, tool calls all keep working. The migration is a config change, not a project.

02Where does it run?

Hosted, per-region, with a global control plane and per-region data planes. Or BYOC inside your VPC, where the data plane stays in your residency zone and the control plane stays out.

03What happens when a provider goes down?

Smart fallback. Tell holds the chosen model, hedges to a backup if tail latency spikes, and routes around outages without changing the request your client sent. The first you'll hear about it is the dashboard.

04Is our data safe?

PII and secrets are redacted at the edge before egress. Day-to-day logs are redacted; sealed originals sit in immutable storage with break-glass controls and per-tenant KMS keys.

05Can we run our own models behind it?

Yes. Self-hosted endpoints (vLLM, TGI, in-house Triton, anything that speaks OpenAI or a known schema) join the routing pool alongside hosted providers; same constraints, same scoring, same audit.

Pricing

Pay for completions, not seats.

Cached calls are free. Routed calls are billed at provider cost plus a thin margin. The incentive sits on the right side.

Plan 01

Pilot

Free · 90 days

For closed-beta partners

Drop Tell in front of one product surface. All features unlocked, weekly check-ins, and pricing locked in for the first year.

One workspace · all providers
Semantic cache + smart routing
Direct line to the engineering team
Pricing fixed at beta rate after launch

Apply for beta

Recommended

Plan 02

Team

From $499/mo

For shipping LLM teams

Hosted, multi-region, full feature set. Usage-based billing on completions that actually leave the gateway; cached calls are free.

Unlimited prompts and providers
Per-team budgets and virtual keys
PII redaction · audit log exports
SOC2-ready data handling
Email + Slack support

Talk to us

Plan 03

Enterprise

Custom

BYOC data plane

Single-tenant deployment in your VPC. BYOK encryption, dedicated SRE, hard multi-tenancy across business units.

Self-hosted or VPC-isolated
Per-tenant KMS keys (BYOK)
SSO, SAML, SCIM
Dedicated solutions engineer
Procurement-friendly contracts

Get in touch

§ 03 · Engagement intake

01 / Start a brief

Talk to the people who’ll do the work.

We staff small and senior, scope by phase, and end on a written deliverable. We don’t sell decks or hours.

If we’re not the right team for the job, we say so on the first call. The bar is production, not pitch.

→team@grouplabs.ca

Compose a brief30 min · intro

WGS84YYC / YUL

CalgaryYYC

51.05°N · 114.07°W

MontrealYUL

45.51°N · 73.55°W

Δ 3,020 km

02 / Where to find us

Calgary, Alberta

Studio HQ

+1 (587) 700-9968

Lat / Lng: 51.0486°N · 114.0708°W
Local: —:— MST · UTC−07

Montreal, Quebec

Satellite office

+1 (825) 365-9891

Lat / Lng: 45.5089°N · 73.5542°W
Local: —:— EST · UTC−05

Our offices

Follow us