Tell/A product from GroupLabsLLM gateway

Every LLM request, routed and accountable.

Tell is a Rust-based gateway between your product and every LLM provider. It caches, routes, redacts, and audits, at sixty thousand requests per second, with sub-millisecond overhead.

No SDK rewrite. No vendor lock-in. No quiet model swaps that hit your customers before they hit your dashboard.

tell.gateway / v1Live
rps62.3k
p504.3 ms
cache58%
err0.00%
request received·· ms
providers4/4 feasible
openaigpt-5us-east47%
anthropicsonnet-4us-east28%
bedrockhaiku-4us-west16%
vertexgemini-2eu-west9%
semantic cache823 keyshit 58%
decisionstep 1/4

POST /v1/chat/completions · json_mode · ctx 32k · region us

Live gateway · last 60sroutes · cache · audit
Throughput
60k+ rps
Overhead
< 1 ms
Mean response
4.3 ms
Failures @ 10k conc.
0

When this matters

If you ship LLM features, you've already met these problems.

Tell is for teams whose product talks to a model on every meaningful click, and whose ops, finance, and legal teams are starting to notice.

    01

    You picked a provider, and now you live there

    One SDK, one bill, one failure mode. The model that beat yours on price last week sits on the other side of a rewrite you keep deferring.

    02

    Spend is creeping and nobody owns it

    The same prompt is generating the same answer for the tenth time today. Two teams are calling GPT-5 for tasks 4o-mini would solve. There is no budget by team, no cap by project.

    03

    Outages are silent until they are loud

    A model gets deprecated. A fingerprint flips overnight. A region quietly throttles. You find out the same way your customers do: through the support inbox.

    04

    Compliance keeps asking what left the building

    PII rides along in the prompts. Secrets land in the logs. There is no audit trail that survives a regulator, and no redaction layer that survives a code review.

How it works

From your client to a provider, through one accountable hop.

Tell sits on the wire. Every prompt your product makes, every response a model returns, passes through the same place, and gets the same treatment.

    01Step

    Point at the URL

    Tell speaks the OpenAI API. Swap one base URL in your existing client; keys, types, and streaming all keep working. No SDK rewrite, no new abstractions to learn.

    02Step

    Solve before you spend

    A constraint solver filters providers by region, context length, cost ceiling, and JSON mode. A multi-objective bandit then scores the survivors on cost, latency, and quality.

    03Step

    Cache, redact, route

    A semantic cache check answers similar prompts in under a millisecond. PII and secrets are redacted at the edge. The chosen provider streams the response back in coherent chunks.

    04Step

    Log every byte

    Prompt, params, route decision, tool traces, fingerprints, confidence map, all attached to one trace. Sealed originals; redacted day-to-day logs; audit-ready exports on demand.

What ships

Cheaper traffic. Calmer outages. One audit trail.

Three things drop into place the day Tell goes live in front of your stack.

  • 01

    A router that picks the cheapest model that still works

    Constraints first, scoring second. Fewer wasted GPT-5 calls on tasks 4o-mini would handle; fewer truncated 4o-mini calls on prompts that actually needed the bigger context.

  • 02

    A semantic cache that pays for itself

    Vector lookups on incoming prompts. Forty to sixty percent of traffic stops short of a provider; the answer was already on the shelf, ten characters of paraphrase away.

  • 03

    A single audit trail across every provider

    Prompts, params, route decisions, fingerprints, redactions, all in one trace. The forensics bundle attaches itself to the incident, not to the engineer who happens to be on call.

    Live traffic · last 60 seconds
    3.7M reqs58% cached$214 spend
    • summarise this 12k-token transcript

      anthropic / sonnet-4
      312 ms
    • classify ticket sentiment

      served from semantic cache

      cache · 0.94 sim
      0.8 ms
    • extract invoice fields → JSON

      json repair · 1 fixup

      openai / gpt-4.1
      198 ms
    • draft customer email re: refund

      PII redacted · 2 spans

      bedrock / haiku-4
      267 ms
tell.gateway / v1Live
rps62.3k
p504.3 ms
cache58%
err0.00%
request received·· ms
providers4/4 feasible
openaigpt-5us-east47%
anthropicsonnet-4us-east28%
bedrockhaiku-4us-west16%
vertexgemini-2eu-west9%
semantic cache823 keyshit 58%
decisionstep 1/4

POST /v1/chat/completions · json_mode · ctx 32k · region us

Live gateway · last 60sroutes · cache · audit

Why Tell

A gateway is the right place for this.

The same problems show up in every LLM stack. Solving them once, on the wire, beats solving them four times in four apps.

    01

    It's a gateway, not a wrapper.

    It lives on the wire. Every request, every retry, every failover passes through the same place, so policy, cost, and audit are written once, not re-implemented per app.

    02

    It's built for the day a provider changes.

    Model fingerprints watched, behavioural contracts enforced, shadow traffic compared. When a provider quietly swaps a checkpoint, you find out before your customers do.

What Tell isn't

Three things people assume, and the actual answer.

The category is full of half-fits. Here's what makes Tell structurally different.

    01

    Not a thin proxy

    It routes on cost, latency, and quality. It repairs JSON output to schema. It hedges tail latency across providers and cancels on first success. A reverse proxy doesn't.

    02

    Not an SDK abstraction

    Your code keeps using the OpenAI client. Tell is the URL it points at. No new types, no rewrites, no second SDK to lock yourself into.

    03

    Not a separate observability tool

    Request, response, prompt diff, route decision, cost, confidence: same trace. One pane, not three integrations and a Looker dashboard.

Common questions

What teams ask before they put it on the wire.

    01Do we have to rewrite our app?

    No. Tell speaks the OpenAI API. Point your existing OpenAI client at Tell's URL: keys, types, streaming, tool calls all keep working. The migration is a config change, not a project.

    02Where does it run?

    Hosted, per-region, with a global control plane and per-region data planes. Or BYOC inside your VPC, where the data plane stays in your residency zone and the control plane stays out.

    03What happens when a provider goes down?

    Smart fallback. Tell holds the chosen model, hedges to a backup if tail latency spikes, and routes around outages without changing the request your client sent. The first you'll hear about it is the dashboard.

    04Is our data safe?

    PII and secrets are redacted at the edge before egress. Day-to-day logs are redacted; sealed originals sit in immutable storage with break-glass controls and per-tenant KMS keys.

    05Can we run our own models behind it?

    Yes. Self-hosted endpoints (vLLM, TGI, in-house Triton, anything that speaks OpenAI or a known schema) join the routing pool alongside hosted providers; same constraints, same scoring, same audit.

Pricing

Pay for completions, not seats.

Cached calls are free. Routed calls are billed at provider cost plus a thin margin. The incentive sits on the right side.

    Plan 01

    Pilot

    Free · 90 days

    For closed-beta partners

    Drop Tell in front of one product surface. All features unlocked, weekly check-ins, and pricing locked in for the first year.

    • One workspace · all providers
    • Semantic cache + smart routing
    • Direct line to the engineering team
    • Pricing fixed at beta rate after launch
    Recommended

    Plan 02

    Team

    From $499/mo

    For shipping LLM teams

    Hosted, multi-region, full feature set. Usage-based billing on completions that actually leave the gateway; cached calls are free.

    • Unlimited prompts and providers
    • Per-team budgets and virtual keys
    • PII redaction · audit log exports
    • SOC2-ready data handling
    • Email + Slack support

    Plan 03

    Enterprise

    Custom

    BYOC data plane

    Single-tenant deployment in your VPC. BYOK encryption, dedicated SRE, hard multi-tenancy across business units.

    • Self-hosted or VPC-isolated
    • Per-tenant KMS keys (BYOK)
    • SSO, SAML, SCIM
    • Dedicated solutions engineer
    • Procurement-friendly contracts

§ 03  ·  Engagement intake

01 / Start a brief

Talk to the people who’ll do the work.

We staff small and senior, scope by phase, and end on a written deliverable. We don’t sell decks or hours.

If we’re not the right team for the job, we say so on the first call. The bar is production, not pitch.

team@grouplabs.ca
Compose a brief30 min · intro
WGS84YYC / YUL
CalgaryYYC
51.05°N · 114.07°W
MontrealYUL
45.51°N · 73.55°W
Δ 3,020 km

02 / Where to find us

01

Calgary, Alberta

Studio HQ
+1 (587) 700-9968
Lat / Lng
51.0486°N · 114.0708°W
Local
—:— MST · UTC−07
02

Montreal, Quebec

Satellite office
+1 (825) 365-9891
Lat / Lng
45.5089°N · 73.5542°W
Local
—:— EST · UTC−05