Mesh/A product from GroupLabsSelf-arranging compute fabric

Compute that arranges itself.

Mesh is a low-level fabric for distributed batch work. Nodes find each other on their own, form a topology, and run jobs partitioned by a scheduler that knows what each device is.

Drop a binary on every box. No control plane. No SRE on call. Today it runs ML batch training. The runtime is the same shape for any data-parallel batch job.

Talk to usSee how it works →v0.2 · in development
mesh.cluster · 6→7 nodesLive
A100
gpu
H100
gpu
64c
cpu · idle
L4
gpu
32c
cpu · idle
A10
gpu
L40
gpu
DiscoverPeer-6 joined · GPU · L40
Mesh runtimephase 1 / 4
Topology
self-arranging
Discovery
built-in
Devices
heterogeneous
Runtime
custom · no JVM

When this matters

If your hardware is uneven and your scheduler can't tell.

Mesh is for teams running batch work across machines that were never meant to be a cluster.

    01

    Your fleet is mixed

    A100s next to L4s next to 32-core CPUs. The scheduler you have today treats them all the same and routes the wrong work to the wrong silicon.

    02

    Bringing up a node is a project

    Manifest edits, manual cluster join, restarts, paged SREs. By the time the box is in service, the workload that needed it has finished.

    03

    The control plane is heavier than the workload

    Kubernetes, etcd, a service mesh, and a CNI plugin. For a batch job that runs once a day on six machines.

    04

    You want to run on whatever you have

    Old GPUs in a closet, a handful of cloud instances, a dev box. Mesh treats whatever is reachable as fabric and partitions by capability.

How it works

From a binary on each box to a job split across them.

Mesh is four moving parts. Discovery, gossip, topology, and a device-aware scheduler. Everything else is your workload.

    01Step

    Drop the binary

    One static binary on each node. No supervisor, no agent, no runtime to install. The binary is the cluster.

    02Step

    Nodes find each other

    Discovery runs on its own. Peers gossip capability info — device class, memory, free cycles. New nodes show up by joining the gossip.

    03Step

    The topology self-arranges

    The fabric forms without a coordinator. Nodes drop, the topology adjusts. Nothing pages anyone.

    04Step

    Scheduler partitions by device

    Jobs arrive, the scheduler reads per-node capability and shards accordingly. GPUs get the heavy work. CPUs get the parts that suit them.

What ships

A fabric, a scheduler, and a runtime that knows the silicon.

Three things drop into place the day Mesh goes live on your fleet.

  • 01

    Discovery without manual config

    New nodes join the gossip and the fabric absorbs them. No DNS records, no static membership lists, no rolling restart of a control plane.

  • 02

    Scheduling that reads the device

    Per-node capability — device class, memory, current load — is gossiped continuously. The scheduler scores each candidate before it routes a shard.

    Scheduler trace · last job7 nodes · 4 shards
    • T+0.000eventpeer-7 joined · GPU · L40 · 48GBfabric size 6 → 7
    • T+0.412eventjob accepted · ResNet-50 · 4 shards
    • T+0.503routeshard-0 → n0 · A100score 0.96
    • T+0.511routeshard-1 → n1 · H100score 0.99
    • T+0.518routeshard-2 → n3 · L4score 0.84
    • T+0.524routeshard-3 → n6 · L40score 0.92
    • T+0.530idlen2, n4 → idlecpu class · model FLOPs
  • 03

    A custom runtime, no Python tax

    No JVM, no kubelet sidecar, no language-tied executor. The runtime is purpose-built for partition / dispatch / reduce, and ML training is the first plugin on top.

mesh.cluster · 6→7 nodesLive
A100
gpu
H100
gpu
64c
cpu · idle
L4
gpu
32c
cpu · idle
A10
gpu
L40
gpu
DiscoverPeer-6 joined · GPU · L40
Mesh runtimephase 1 / 4

Why Mesh

Most schedulers assume your fleet is identical. Most fleets aren't.

Mesh is built around the case where every box is different — and where there is no team to run a control plane for it.

    01

    Built for the fleet you actually have.

    Most schedulers assume identical workers. Mesh assumes the opposite. Heterogeneous is the default, and the scheduler is written for that case from the start.

    02

    No control plane to babysit.

    There is no head node. There is no etcd. The fabric is the cluster, and the cluster is what's running. When a node leaves, the topology adjusts. Nothing pages anyone.

What Mesh isn't

Three things people assume — and the actual answer.

The category is crowded. Here's how Mesh sits relative to the things it gets compared to.

    01

    Not Kubernetes

    No cluster bootstrap. No kubelet. No YAML to roll out a job. Mesh is a binary and a scheduler, not a platform. If you needed K8s, you'd already have it.

    02

    Not Ray

    Ray is a Python-native task framework with a head node. Mesh is lower-level: a fabric and a scheduler, not a programming model. ML training sits on top as a plugin.

    03

    Not Slurm

    Slurm assumes a static partition and a shared filesystem. Mesh assumes neither. Nodes come and go on their own, and the scheduler is built for mixed silicon.

Common questions

What teams ask before they install.

    01What workloads does it run today?

    ML batch training. Data-parallel training across heterogeneous GPUs is the live workload, with shard sizing handled by the device-aware scheduler. The runtime is workload-agnostic; training is the first plugin on top.

    02Can we run other things on it?

    Yes, anything batch-shaped that fits the partition / dispatch / reduce pattern. Inference batches, simulation sweeps, large feature pipelines. The runtime is the same; only the plugin on top changes.

    03How do nodes discover each other?

    Mesh ships with a peer-gossip discovery layer. On a flat L2 network, that is all you need. On segmented networks, you point new nodes at any existing peer and the rest happens on its own.

    04Is there a head node or a control plane?

    No. The fabric is the scheduler. Coordination is gossip, consensus is local, and jobs run wherever they fit. There is no etcd to lose, no head to fail over.

    05When can we use it?

    v0.2 is in development. We are piloting with a small number of teams running heterogeneous training. If that sounds like you, get in touch and we will scope a deployment together.

Pricing

Pay for the fleet, not the workload.

Mesh is sized to nodes, not jobs. The more you put through the fabric, the cheaper each job becomes.

    Plan 01

    Pilot

    Co-built

    No cost during v0

    We deploy Mesh on your fleet alongside your team. Direct line to the engineers building it, and pricing locked in at v1 release.

    • Up to one fleet, any size
    • Co-built deployment and integration
    • Direct line to the engineering team
    • Locked-in v1 pricing
    Recommended

    Plan 02

    Cluster

    From $1,500/mo

    For production fleets

    Production install across your fleet. Fixed monthly fee sized to node count, no per-job billing, no per-GPU surcharges.

    • Unlimited jobs and shards
    • Heterogeneous device support
    • Scheduler trace and audit log
    • SOC2-ready data handling
    • Email + Slack support

    Plan 03

    Custom

    Custom

    Self-hosted, integrated

    Self-hosted deployment, custom plugins, deep integration with your job submission and observability stack. For regulated environments and large fleets.

    • Self-hosted or air-gapped
    • Custom plugins and runtime hooks
    • SSO, SAML, audit retention
    • Dedicated solutions engineer
    • Procurement-friendly contracts

§ 03  ·  Engagement intake

01 / Start a brief

Talk to the people who’ll do the work.

We staff small and senior, scope by phase, and end on a written deliverable. We don’t sell decks or hours.

If we’re not the right team for the job, we say so on the first call. The bar is production, not pitch.

team@grouplabs.ca
Compose a brief30 min · intro
WGS84YYC / YUL
CalgaryYYC
51.05°N · 114.07°W
MontrealYUL
45.51°N · 73.55°W
Δ 3,020 km

02 / Where to find us

01

Calgary, Alberta

Studio HQ
+1 (587) 700-9968
Lat / Lng
51.0486°N · 114.0708°W
Local
—:— MST · UTC−07
02

Montreal, Quebec

Satellite office
+1 (825) 365-9891
Lat / Lng
45.5089°N · 73.5542°W
Local
—:— EST · UTC−05