Compute that arranges itself.
Mesh is a low-level fabric for distributed batch work. Nodes find each other on their own, form a topology, and run jobs partitioned by a scheduler that knows what each device is.
Drop a binary on every box. No control plane. No SRE on call. Today it runs ML batch training. The runtime is the same shape for any data-parallel batch job.
- Topology
- self-arranging
- Discovery
- built-in
- Devices
- heterogeneous
- Runtime
- custom · no JVM
When this matters
If your hardware is uneven and your scheduler can't tell.
Mesh is for teams running batch work across machines that were never meant to be a cluster.
Your fleet is mixed
A100s next to L4s next to 32-core CPUs. The scheduler you have today treats them all the same and routes the wrong work to the wrong silicon.
Bringing up a node is a project
Manifest edits, manual cluster join, restarts, paged SREs. By the time the box is in service, the workload that needed it has finished.
The control plane is heavier than the workload
Kubernetes, etcd, a service mesh, and a CNI plugin. For a batch job that runs once a day on six machines.
You want to run on whatever you have
Old GPUs in a closet, a handful of cloud instances, a dev box. Mesh treats whatever is reachable as fabric and partitions by capability.
How it works
From a binary on each box to a job split across them.
Mesh is four moving parts. Discovery, gossip, topology, and a device-aware scheduler. Everything else is your workload.
Drop the binary
One static binary on each node. No supervisor, no agent, no runtime to install. The binary is the cluster.
Nodes find each other
Discovery runs on its own. Peers gossip capability info — device class, memory, free cycles. New nodes show up by joining the gossip.
The topology self-arranges
The fabric forms without a coordinator. Nodes drop, the topology adjusts. Nothing pages anyone.
Scheduler partitions by device
Jobs arrive, the scheduler reads per-node capability and shards accordingly. GPUs get the heavy work. CPUs get the parts that suit them.
What ships
A fabric, a scheduler, and a runtime that knows the silicon.
Three things drop into place the day Mesh goes live on your fleet.
- 01
Discovery without manual config
New nodes join the gossip and the fabric absorbs them. No DNS records, no static membership lists, no rolling restart of a control plane.
- 02
Scheduling that reads the device
Per-node capability — device class, memory, current load — is gossiped continuously. The scheduler scores each candidate before it routes a shard.
Scheduler trace · last job7 nodes · 4 shards- T+0.000eventpeer-7 joined · GPU · L40 · 48GBfabric size 6 → 7
- T+0.412eventjob accepted · ResNet-50 · 4 shards
- T+0.503routeshard-0 → n0 · A100score 0.96
- T+0.511routeshard-1 → n1 · H100score 0.99
- T+0.518routeshard-2 → n3 · L4score 0.84
- T+0.524routeshard-3 → n6 · L40score 0.92
- T+0.530idlen2, n4 → idlecpu class · model FLOPs
- 03
A custom runtime, no Python tax
No JVM, no kubelet sidecar, no language-tied executor. The runtime is purpose-built for partition / dispatch / reduce, and ML training is the first plugin on top.
Why Mesh
Most schedulers assume your fleet is identical. Most fleets aren't.
Mesh is built around the case where every box is different — and where there is no team to run a control plane for it.
01
Built for the fleet you actually have.
Most schedulers assume identical workers. Mesh assumes the opposite. Heterogeneous is the default, and the scheduler is written for that case from the start.
02
No control plane to babysit.
There is no head node. There is no etcd. The fabric is the cluster, and the cluster is what's running. When a node leaves, the topology adjusts. Nothing pages anyone.
What Mesh isn't
Three things people assume — and the actual answer.
The category is crowded. Here's how Mesh sits relative to the things it gets compared to.
Not Kubernetes
No cluster bootstrap. No kubelet. No YAML to roll out a job. Mesh is a binary and a scheduler, not a platform. If you needed K8s, you'd already have it.
Not Ray
Ray is a Python-native task framework with a head node. Mesh is lower-level: a fabric and a scheduler, not a programming model. ML training sits on top as a plugin.
Not Slurm
Slurm assumes a static partition and a shared filesystem. Mesh assumes neither. Nodes come and go on their own, and the scheduler is built for mixed silicon.
Common questions
What teams ask before they install.
01What workloads does it run today?
ML batch training. Data-parallel training across heterogeneous GPUs is the live workload, with shard sizing handled by the device-aware scheduler. The runtime is workload-agnostic; training is the first plugin on top.
02Can we run other things on it?
Yes, anything batch-shaped that fits the partition / dispatch / reduce pattern. Inference batches, simulation sweeps, large feature pipelines. The runtime is the same; only the plugin on top changes.
03How do nodes discover each other?
Mesh ships with a peer-gossip discovery layer. On a flat L2 network, that is all you need. On segmented networks, you point new nodes at any existing peer and the rest happens on its own.
04Is there a head node or a control plane?
No. The fabric is the scheduler. Coordination is gossip, consensus is local, and jobs run wherever they fit. There is no etcd to lose, no head to fail over.
05When can we use it?
v0.2 is in development. We are piloting with a small number of teams running heterogeneous training. If that sounds like you, get in touch and we will scope a deployment together.
Pricing
Pay for the fleet, not the workload.
Mesh is sized to nodes, not jobs. The more you put through the fabric, the cheaper each job becomes.
- Up to one fleet, any size
- Co-built deployment and integration
- Direct line to the engineering team
- Locked-in v1 pricing
- Unlimited jobs and shards
- Heterogeneous device support
- Scheduler trace and audit log
- SOC2-ready data handling
- Email + Slack support
- Self-hosted or air-gapped
- Custom plugins and runtime hooks
- SSO, SAML, audit retention
- Dedicated solutions engineer
- Procurement-friendly contracts
Plan 01
Pilot
Co-built
No cost during v0
We deploy Mesh on your fleet alongside your team. Direct line to the engineers building it, and pricing locked in at v1 release.
Plan 02
Cluster
From $1,500/mo
For production fleets
Production install across your fleet. Fixed monthly fee sized to node count, no per-job billing, no per-GPU surcharges.
Plan 03
Custom
Custom
Self-hosted, integrated
Self-hosted deployment, custom plugins, deep integration with your job submission and observability stack. For regulated environments and large fleets.
§ 03 · Engagement intake
01 / Start a brief
Talk to the people who’ll do the work.
We staff small and senior, scope by phase, and end on a written deliverable. We don’t sell decks or hours.
If we’re not the right team for the job, we say so on the first call. The bar is production, not pitch.
02 / Where to find us
Montreal, Quebec
Satellite office- Lat / Lng
- 45.5089°N · 73.5542°W
- Local
- —:— EST · UTC−05