Mesh/A product from GroupLabsSelf-arranging compute fabric

Compute that arranges itself.

Mesh is a low-level fabric for distributed batch work. Nodes find each other on their own, form a topology, and run jobs partitioned by a scheduler that knows what each device is.

Drop a binary on every box. No control plane. No SRE on call. Today it runs ML batch training. The runtime is the same shape for any data-parallel batch job.

Talk to us See how it works →v0.2 · in development

mesh.cluster · 6→7 nodesLive

A100

gpu

H100

gpu

64c

cpu · idle

gpu

32c

cpu · idle

A10

gpu

L40

gpu

DiscoverPeer-6 joined · GPU · L40

Topology: self-arranging
Discovery: built-in
Devices: heterogeneous
Runtime: custom · no JVM

When this matters

If your hardware is uneven and your scheduler can't tell.

Mesh is for teams running batch work across machines that were never meant to be a cluster.

Your fleet is mixed

A100s next to L4s next to 32-core CPUs. The scheduler you have today treats them all the same and routes the wrong work to the wrong silicon.

Bringing up a node is a project

Manifest edits, manual cluster join, restarts, paged SREs. By the time the box is in service, the workload that needed it has finished.

The control plane is heavier than the workload

Kubernetes, etcd, a service mesh, and a CNI plugin. For a batch job that runs once a day on six machines.

You want to run on whatever you have

Old GPUs in a closet, a handful of cloud instances, a dev box. Mesh treats whatever is reachable as fabric and partitions by capability.

How it works

From a binary on each box to a job split across them.

Mesh is four moving parts. Discovery, gossip, topology, and a device-aware scheduler. Everything else is your workload.

01Step

Drop the binary

One static binary on each node. No supervisor, no agent, no runtime to install. The binary is the cluster.

02Step

Nodes find each other

Discovery runs on its own. Peers gossip capability info — device class, memory, free cycles. New nodes show up by joining the gossip.

03Step

The topology self-arranges

The fabric forms without a coordinator. Nodes drop, the topology adjusts. Nothing pages anyone.

04Step

Scheduler partitions by device

Jobs arrive, the scheduler reads per-node capability and shards accordingly. GPUs get the heavy work. CPUs get the parts that suit them.

What ships

A fabric, a scheduler, and a runtime that knows the silicon.

Three things drop into place the day Mesh goes live on your fleet.

01
Discovery without manual config
New nodes join the gossip and the fabric absorbs them. No DNS records, no static membership lists, no rolling restart of a control plane.
02
Scheduling that reads the device
Per-node capability — device class, memory, current load — is gossiped continuously. The scheduler scores each candidate before it routes a shard.
Scheduler trace · last job7 nodes · 4 shards
- T+0.000eventpeer-7 joined · GPU · L40 · 48GBfabric size 6 → 7
- T+0.412eventjob accepted · ResNet-50 · 4 shards
- T+0.503routeshard-0 → n0 · A100score 0.96
- T+0.511routeshard-1 → n1 · H100score 0.99
- T+0.518routeshard-2 → n3 · L4score 0.84
- T+0.524routeshard-3 → n6 · L40score 0.92
- T+0.530idlen2, n4 → idlecpu class · model FLOPs
03
A custom runtime, no Python tax
No JVM, no kubelet sidecar, no language-tied executor. The runtime is purpose-built for partition / dispatch / reduce, and ML training is the first plugin on top.

mesh.cluster · 6→7 nodesLive

A100

gpu

H100

gpu

64c

cpu · idle

gpu

32c

cpu · idle

A10

gpu

L40

gpu

DiscoverPeer-6 joined · GPU · L40

Why Mesh

Most schedulers assume your fleet is identical. Most fleets aren't.

Mesh is built around the case where every box is different — and where there is no team to run a control plane for it.

Built for the fleet you actually have.

Most schedulers assume identical workers. Mesh assumes the opposite. Heterogeneous is the default, and the scheduler is written for that case from the start.

No control plane to babysit.

There is no head node. There is no etcd. The fabric is the cluster, and the cluster is what's running. When a node leaves, the topology adjusts. Nothing pages anyone.

What Mesh isn't

Three things people assume — and the actual answer.

The category is crowded. Here's how Mesh sits relative to the things it gets compared to.

Not Kubernetes

No cluster bootstrap. No kubelet. No YAML to roll out a job. Mesh is a binary and a scheduler, not a platform. If you needed K8s, you'd already have it.

Not Ray

Ray is a Python-native task framework with a head node. Mesh is lower-level: a fabric and a scheduler, not a programming model. ML training sits on top as a plugin.

Not Slurm

Slurm assumes a static partition and a shared filesystem. Mesh assumes neither. Nodes come and go on their own, and the scheduler is built for mixed silicon.

Common questions

What teams ask before they install.

01What workloads does it run today?

ML batch training. Data-parallel training across heterogeneous GPUs is the live workload, with shard sizing handled by the device-aware scheduler. The runtime is workload-agnostic; training is the first plugin on top.

02Can we run other things on it?

Yes, anything batch-shaped that fits the partition / dispatch / reduce pattern. Inference batches, simulation sweeps, large feature pipelines. The runtime is the same; only the plugin on top changes.

03How do nodes discover each other?

Mesh ships with a peer-gossip discovery layer. On a flat L2 network, that is all you need. On segmented networks, you point new nodes at any existing peer and the rest happens on its own.

04Is there a head node or a control plane?

No. The fabric is the scheduler. Coordination is gossip, consensus is local, and jobs run wherever they fit. There is no etcd to lose, no head to fail over.

05When can we use it?

v0.2 is in development. We are piloting with a small number of teams running heterogeneous training. If that sounds like you, get in touch and we will scope a deployment together.

Pricing

Pay for the fleet, not the workload.

Mesh is sized to nodes, not jobs. The more you put through the fabric, the cheaper each job becomes.

Plan 01

Pilot

Co-built

No cost during v0

We deploy Mesh on your fleet alongside your team. Direct line to the engineers building it, and pricing locked in at v1 release.

Up to one fleet, any size
Co-built deployment and integration
Direct line to the engineering team
Locked-in v1 pricing

Apply for pilot

Recommended

Plan 02

Cluster

From $1,500/mo

For production fleets

Production install across your fleet. Fixed monthly fee sized to node count, no per-job billing, no per-GPU surcharges.

Unlimited jobs and shards
Heterogeneous device support
Scheduler trace and audit log
SOC2-ready data handling
Email + Slack support

Talk to us

Plan 03

Custom

Self-hosted, integrated

Self-hosted deployment, custom plugins, deep integration with your job submission and observability stack. For regulated environments and large fleets.

Self-hosted or air-gapped
Custom plugins and runtime hooks
SSO, SAML, audit retention
Dedicated solutions engineer
Procurement-friendly contracts

Get in touch

§ 03 · Engagement intake

01 / Start a brief

Talk to the people who’ll do the work.

We staff small and senior, scope by phase, and end on a written deliverable. We don’t sell decks or hours.

If we’re not the right team for the job, we say so on the first call. The bar is production, not pitch.

→team@grouplabs.ca

Compose a brief30 min · intro

WGS84YYC / YUL

CalgaryYYC

51.05°N · 114.07°W

MontrealYUL

45.51°N · 73.55°W

Δ 3,020 km

02 / Where to find us

Calgary, Alberta

Studio HQ

+1 (587) 700-9968

Lat / Lng: 51.0486°N · 114.0708°W
Local: —:— MST · UTC−07

Montreal, Quebec

Satellite office

+1 (825) 365-9891

Lat / Lng: 45.5089°N · 73.5542°W
Local: —:— EST · UTC−05

Our offices

Follow us

Compute that arranges itself.

If your hardware is uneven and your scheduler can't tell.

Your fleet is mixed

Bringing up a node is a project

The control plane is heavier than the workload

You want to run on whatever you have

From a binary on each box to a job split across them.

Drop the binary

Nodes find each other

The topology self-arranges

Scheduler partitions by device

A fabric, a scheduler, and a runtime that knows the silicon.

Discovery without manual config

Scheduling that reads the device

A custom runtime, no Python tax

Most schedulers assume your fleet is identical. Most fleets aren't.

Built for the fleet you actually have.

No control plane to babysit.

Three things people assume — and the actual answer.

Not Kubernetes

Not Ray

Not Slurm

What teams ask before they install.

Pay for the fleet, not the workload.

Pilot

Cluster

Custom

Talk to the people who’ll do the work.

Intro call