Ship Racks with Performance Confidence.

Automated AI rack and cluster validation. Ready to generate revenue at delivery on Day 1.

Download Now

The Problem

Data Centers Receive Racks Failing to Perform / Generate Revenue on Day 1

“Data centers receive misassembled, misconfigured, dead-on-arrival components.” — Together AI

Components tested in isolation, not as a system under AI workloads.

Silent failures like RCCL bandwidth and NVLink errors go undetected.

Operators discover underperformance only after the rack is live.

$2M+

at stake in a single 32-GPU B200 rack.

Every additional week in L12 validation or post-delivery debug is a direct hit to OEM gross margin and operator time-to-revenue.

ClusterReady closes this gap — before the rack ships.

Key Performance Metrics

ClusterReady

Validate with AI workloads and XPerf’s proprietary test suite at 10× faster validation. Get the remedy and optimization — not just the failure.

10× FASTER

Faster Validation

AI workloads + XPerf proprietary test suite take validation from weeks to hours.

HIDDEN ISSUES

Find What L11/L12 Misses

Silent failures — RCCL bandwidth, NVLink errors — caught before shipment.

SUGGESTED FIX

Root Cause + Remedy

Not just a failure flag — the ONLY tool providing diagnosis, remedy, and optimization.

Demo

See ClusterReady in action

How It Works

Bring Assembled Rack to Production Ready

Four automated stages take every rack from arrival to a verified, optimized, revenue-ready state.

01/

Pre-Flight Health Check

30+ automated checks. GPU · CPU · memory · network · storage.

02/

AI Workloads Validate

MLPerf benchmarks + XPerf database. Catches what standard testing misses.

03/

Stress Testing — Push to Limits

XPerf’s patented stress testing algorithm. Every server and every link, at peak. Surfaces marginal hardware before customers do.

04/

Production Ready — Fix & Optimize

Suggested remedy on every finding. Suggested optimization guidance.

What We Validate

Validated at Every Layer

ClusterReady tests the rack as a system — hardware, software, and AI workloads — not isolated components.

Hardware

GPU count, model & health.
PCIe speed, width & ACS.
IB / RoCE ports & link quality.
Transceiver & cabling health.
Storage filesystem I/O.
Thermals & power draw.

Software

OS, kernel & IOMMU config.
GPU drivers (NVIDIA & AMD).
GPUDirect / RDMA stack.
Container GPU passthrough.
Firewall & security state.
Cross-node connectivity.

AI Workloads

Tokens / sec throughput.
Watt / token efficiency.
TTFT, TPOT & p50–p99 latency.
Training MFU % & convergence.
Multi-node scaling.
LLM (LLaMA, Mixtral), Text-to-Image (FLUX), Text-to-Video (Wan2.2), Text-to-Speech (Whisper).

Outcome

Weeks of Uncertainty → Hours of Confidence

Before·L11/L12 only

2–6 weeks to validate.
Components only — no system test.
No AI workloads.
Silent failures missed.
Manual triage.

Avg. validation: 2–6 weeks

After·With ClusterReady

Hours or days to validate.
Full rack/cluster tested as a system.
Inference + training workloads.
Hidden failures caught pre-shipment.
Root cause + remedy + optimization.

Avg. validation: Hours to Days

Validation Landscape

Why ClusterReady Wins

ClusterReady complements L11/L12 testing — what L11/L12 ships, ClusterReady proves at the system level.

Capability	L11/L12 Testing	NVIDIA / AMD Tools	OEM Built-in (Dell, SMC)	Custom Scripts	Aptly Technology	XPerf ClusterReady
Test full rack?	No	No	Yes	Varies by team	SMC + NVIDIA only	Yes
Use AI workloads?	No	Synthetic only	Dell yes / SMC partial	Some custom	Yes	Yes
Automated diagnosis + suggested remedy?	No	No	No	No	No	Yes
Serve any OEM / ODM?	No	Yes	No	No	Yes	Yes
Time to validate	2–6 weeks	Hours	Days–weeks	Weeks	Weeks	2–8 hours for healthy racks

XPerf ClusterReady

Test full rack?: Yes
Use AI workloads?: Yes
Automated diagnosis + suggested remedy?: Yes
Serve any OEM / ODM?: Yes
Time to validate: 2–8 hours for healthy racks

L11/L12 Testing

Test full rack?: No
Use AI workloads?: No
Automated diagnosis + suggested remedy?: No
Serve any OEM / ODM?: No
Time to validate: 2–6 weeks

NVIDIA / AMD Tools

Test full rack?: No
Use AI workloads?: Synthetic only
Automated diagnosis + suggested remedy?: No
Serve any OEM / ODM?: Yes
Time to validate: Hours

OEM Built-in (Dell, SMC)

Test full rack?: Yes
Use AI workloads?: Dell yes / SMC partial
Automated diagnosis + suggested remedy?: No
Serve any OEM / ODM?: No
Time to validate: Days–weeks

Custom Scripts

Test full rack?: Varies by team
Use AI workloads?: Some custom
Automated diagnosis + suggested remedy?: No
Serve any OEM / ODM?: No
Time to validate: Weeks

Aptly Technology

Test full rack?: SMC + NVIDIA only
Use AI workloads?: Yes
Automated diagnosis + suggested remedy?: No
Serve any OEM / ODM?: Yes
Time to validate: Weeks

Supported Hardware

Works With the GPUs You Ship Today

Coverage across both major accelerator vendors and current network fabrics, with new SKUs added continuously.

MI300X | MI325X | MI350X | MI355X

256GB HBM3e validated | RoCE/RDMA

GB300 | B300 | H200 | H100

NVLink 4/5 validated | InfiniBand | NVLink, RoCE

Case Study

AMD MI325X Rack — First Public Performance Result

4 nodes · 32× AMD MI325X GPUs · 256 GB HBM3e · RoCE / RDMA

FLUX.1-DEV+20%

vs AMD’s previous performance results.

Multi-Node95.8%

scaling efficiency from 1 node → 4 nodes.

Llama 2 70B33,695

tokens/sec inference, single node.

RCCL10×

bandwidth gain from RCCL config optimization.

The first published Llama 3.1 pretraining performance result on MI325X submitted to MLPerf Training v6.0.

Get Started

Download ClusterReady

Available on macOS, Windows, and Linux. Pick your platform to grab the latest build.

Windows

x64

Loading…Windows 10 or later

macOS

Apple Silicon & Intel

Loading…macOS 11.0 or later

Linux

x64

Loading…Ubuntu 22.04 or compatible

About

About XPerf Inc.

Founded by ex-Intel engineers with extensive experience deploying clusters with thousands of accelerators, XPerf Inc. is the control plane for data centers — AI infrastructure software for GPU cluster performance validation and optimization, taking an AI-native approach to cluster operation problems. Austin / Round Rock, Texas.