One primitive — no model changes — GPU · CPU · any platform

AI inference up to 106× faster
and 99% less energy.
Same hardware. Same model. One import line.

Run the benchmark
GPU · CPU · any device · ~2 minutes · results signed to your name
Already ran a test?
Token active
Choose benchmark — runs on our server
Nothing installs · works on any device · signed SHA-256 result
Server: 8 vCPU · 32 GB RAM · shared · a fraction of the H200/B200 hardware used for our published 100×+ results
43×
vs cuSPARSE · 2.49× cuBLAS
OLMoE-1B-7B
87.5% natural sp · REAL
40×
vs cuSPARSE · 2.94× cuBLAS
DeepSeek-V2-Lite
90.6% natural sp · REAL
74×
vs cuSPARSE · 3.38× cuBLAS
Phi-3.5-MoE
87.5% natural sp · REAL
35×
vs cuSPARSE · 3.37× cuBLAS
Qwen1.5-MoE-A2.7B
93.3% natural sp · REAL
Universal Compatibility

Works on every platform. Today and tomorrow.

NVIDIA · AMD · Intel · ARM · Apple · Google TPU · Custom ASICs · FPGAs · Photonic · Quantum · Any hardware that does matrix multiply.

Live benchmark

Your device vs ROLV. Side by side. Right now.

The left panel runs standard matrix multiply in your browser — your actual hardware. The right panel runs ROLV on our server with identical inputs. Both signed and explained.

Your hardware — this machine runs the baseline
Benchmark server: 8 vCPU · 32 GB RAM · CPU-only · shared with other visitors · a fraction of the NVIDIA H200/B200 hardware where ROLV delivers 100×+ speedups
Your browser — MKL baseline
Standard AI computation — no optimisation
This is what every AI system runs today
Computing...
ROLV server — same inputs
ROLV Primitive© — server-side, protected
Real AI model weights, 500 iterations, signed result
Contacting server... (may take 30s if sleeping)
Free · no install · works on any device · results signed and verifiable
A Story About Waste at Scale

ROLV Makes AI Available to Anyone,
Anywhere with a PC.

Picture a container ship crossing the Pacific. It carries 20,000 containers. The manifest says 5,000 of them are empty — have always been empty, will be empty on arrival. But the ship cannot leave them behind. Its loading system was built decades ago and it can only operate one way: load everything, sail everything, unload everything.

It burns fuel proportional to its total cargo — including the 5,000 empty containers. The crew works proportional to total cargo. The port fees are proportional to total cargo. Every crossing. Every time.

This is what cuBLAS does with MoE inference. The empty containers are the inactive experts — architecturally zero, guaranteed by the router, known before the computation starts. cuBLAS has no mechanism to leave them on the dock. It computes all of them, every token, every layer, every inference call.

ROLV Primitive© is the loading system that reads the manifest first. It identifies the empty containers before departure. It sails only what carries cargo. Same destination. Same output. A fraction of the fuel.

The numbers behind the analogy
DeepSeek-V3 — 256 experts, top-8 active
248
empty containers per token
96.9% of all compute wasted by cuBLAS
ROLV Primitive© computes only
8
active experts — exactly
8.76× faster · 110× vs cuSPARSE · PASS
Mixtral-8×7B — 8 experts, top-2 active
6
empty containers per token
75% of all compute wasted by cuBLAS
ROLV computes only
2
active experts — exactly
1.86× faster · 109× vs cuSPARSE · PASS

Every frontier model crossing the Pacific today carries empty containers. ROLV leaves them on the dock.

Benchmarks — Real Weights · SHA-256 Verified · 1,000 iters

Full results. Every claim verified.

ModelSrcNat sp% vs cuBLASvs cuSPARSE Energy%Tokens/sPASS
Mixtral-8×7BREAL75.0%1.86×109×46%2,185,075
Mixtral-8×22Bsynth75.0%2.43×107×59%1,073,568
Qwen2-57B-A14Bsynth87.5%3.37×70×70%2,374,040
Qwen3-30B-A3BREAL93.8%3.43×32×71%6,650,774
Llama-4-Scout ★REAL93.8%4.75×103×79%5,795,875
DeepSeek-V3/R1synth96.9%8.76×110×89%1,758,046

NVIDIA B200 · BF16 · TF32 ON · 1,000 iters · ATOL=0.05 col-norm fp64 · 4 SHA-256 hashes + perturbation PASS

Model / LayerGPUSparsityvs cuBLASvs vendor sparsePASS
LLaMA-3.1-8B up_proj [REAL]H20080%2.17×9.53×
LLaMA-3.1-8B up_proj [REAL]H20090%2.79×8.66×
DeepSeek-R1 embed [REAL]B20095%19.42×19.42×
10k×10k syntheticB20070%3.11×12.06×
10k×10k syntheticMI300X85%8.5×83.77×
Tesla T4 syntheticT490%5.8×14.2×

1,684/1,684 total PASS · BF16 · TF32 ON · ATOL=0.05 · AMD MI300X: rocBLAS 8.5× (rocSPARSE has known regression at high sparsity)

Model / LayerCPUSparsity vs MKL (iter)vs MKL (total+build) Energy↓PASS
Mistral-7B q_proj [REAL]Intel i795%21.45×18.58×95%
Qwen3-8B down_proj [REAL] ★Intel i795%20.86×17.88×95%
Gemma4-E4B up_proj [REAL] ★Intel i795%19.56×17.29×95%
Llama-3.1-8B q_proj [REAL] ★Intel i795%24.44×22.20×96%
Qwen2.5-7B gate_proj [REAL] ★Intel i795%59.70×98%
SmolLM2-1.7B · Qwen2.5-1.5B · Llama-3.2-1B on Colab Xeon · 125/125 PASS at 70–99% induced sparsity
SmolLM2-1.7B gate_proj [REAL]Xeon Colab95%27.26×96%
Llama-3.2-1B down_proj [REAL] ★ PEAKIntel i799%106.65×9.07×99%
TOTAL CPU: 9 models · 332/332 PASS · Avg 7.37× · Peak 106.65×

Intel i7 laptop (4 cores, 68GB RAM) · Mistral-7B + Qwen3-8B + Gemma4-E4B + Phi-4 + DeepSeek-R1-7B + Qwen2.5-7B + Llama-3.2-3B + Llama-3.1-8B + Gemma-2-2B real HuggingFace weights · MKL baseline · Speedup includes ROLV build time · 252/252 PASS (i7) + 125/125 PASS (Colab Xeon wheel, 5-level) = 377/377 total · 377/377 perturbation PASS · 1,000 iters · ATOL=0.05

HardwareMatrixSparsity cuSPARSE msROLV ms ROLV winsPASS
NVIDIA H200LLaMA up_proj80%5.900.6199.53×
NVIDIA H200LLaMA up_proj90%3.010.3488.66×
NVIDIA B200Mixtral-8×7B MoE75%25.650.234109×
NVIDIA B200Llama-4-Scout MoE94%9.140.088103×
NVIDIA B20010k×10k synthetic70%4.310.3612.06×
AMD MI300X10k×10k synthetic85%74.270.8983.77×
Intel i7 CPUMistral-7B q_proj95%66.43.1814.01×

cuSPARSE is NVIDIA’s own sparse library — tuned by hundreds of engineers. ROLV beats it everywhere because dense matmul on a small submatrix outperforms CSR index lookups for LLM weight patterns. AMD MI300X uses rocSPARSE which has a known performance regression at high sparsity — rocBLAS 8.5× comparison also published.

Calculators

Measure. Switch. Save.

Quantify ROLV's impact on your infrastructure. The two primary calculators below cover capital and operating expense; below them, three advanced tools for deeper analysis.

▲ Capex Savings Calculator
Current capex
$3.0B
Units saved
80,000
Capex saved
$2.4B
Speedup from published ROLV benchmarks on real model weights.
▲ Opex Savings — Energy Calculator
Total cost/yr
$76.5M
Saved/yr
$35M
3-year saving
$105M
CO² avoided/yr
117,000 t
Energy savings based on ROLV benchmark results at stated sparsity level.
Advanced tools
△ ROLV Unit™ — Measure True Compute Efficiency

The ROLV Unit™ is a normalised measure of compute efficiency that accounts for sparsity. Unlike TFLOPS (which measures peak theoretical throughput) or tokens/s (which conflates hardware and software), the ROLV Unit measures useful compute — work done on non-zero elements only.

1 ROLV Unit = 1 TFLOP of compute on live (non-zero) matrix elements per second, at full precision, verified by SHA-256 hash.

Your Compute in ROLV Units
Without ROLV
562 RU
wasted on zero rows
With ROLV
2,250 RU
all compute is useful
Cluster efficiency gain
4.0× more useful compute — same hardware
ROLV Unit = TFLOPS on verified non-zero elements. Vendor TFLOPS counts all compute including zero rows.
▶ ROLVswitch™ & VRAM — Crossover & Memory Calculator

ROLVswitch™ finds the exact sparsity where ROLV beats dense, and whether your matrix fits in VRAM.

ROLVswitch Analysis
Switch to ROLV above
VRAM analysis
At your sparsity
■ RSMT™ — Sparse Storage Threshold Calculator

RSMT™ finds the exact sparsity threshold where sparse storage beats dense for your dtype.

Loading...
Why RSMT™ Matters

The crossover point depends entirely on your dtype. With bfloat16 (2 bytes) and int32 indices (4 bytes), sparse format costs 3× more bytes per non-zero than dense. Sparse wins only when you have enough zeros to overcome the index overhead.

Your MoE models at bfloat16
Mixtral-8×7B: 75%  ✓ well above crossover
Qwen3-30B-A3B: 93.8%  ✓ far above crossover
Llama-4-Scout: 93.8%  ✓ far above crossover
DeepSeek-V3: 96.9%  ✓ extreme advantage

RSMT™ is computed analytically — no approximation.

Enterprise & Institutional Evaluation

Evaluate on your own hardware.
NDA-gated. Hardware-locked. Signed every run.

Two deployment tiers for serious evaluation on your own models, your own data, your own processors. If you just want to see ROLV working end-to-end first, the live benchmark above runs in under two minutes with no install. All enterprise runs are RolvKey™-signed — SHA-256 over your speedup, processor fingerprint, and a time-bounded attestation.

Recommended
Secure Container

RolvKey™ authenticated.
Hardware-locked Docker.

Evaluation licence + NDA. Container binds to your processor fingerprint at first run — will not execute on any other machine. Optional Intel SGX hardware encryption for regulated environments.

Contact rolv@rolv.ai →
Direct Hardware

No Docker.
Single authenticated file.

Bare-metal servers and air-gapped environments where Docker is not permitted. Processor-bound binary with live heartbeat attestation. Evaluation licence + NDA required.

Contact rolv@rolv.ai →
RolvKey™ — New IP — Patent Pending

A second invention, born from protecting the first.

In building the secure distribution system for ROLV Primitive© we developed a novel software protection architecture that we believe has standalone commercial value entirely apart from ROLV itself.

RolvKey™ uses a proprietary multi-layer mathematical key derivation system. Every key exchange is unique and time-bounded to a window of seconds. A captured response is worthless moments later. An attacker who somehow breaks the first layer immediately faces a second independent layer, then a third — each seeded with a completely different secret.

The only viable attack requires simultaneously compromising multiple independent systems within a narrow time window. For any commercial adversary this is not a realistic threat model.

Market opportunity

Every software company shipping proprietary compiled code faces the same distribution security problem. Current solutions — hardware dongles, standard license servers, code obfuscation — have well-documented weaknesses. The academic literature identified this specific application — software distribution key management and API attestation — as commercially unsolved. RolvKey™ addresses it.

Live right now

RolvKey™ is protecting ROLV Primitive© today. Every Docker container download, every key exchange, every benchmark run on every machine worldwide is secured by this system. It has been exercised thousands of times in production.

Licensing and partnership enquiries: rolv@rolv.ai

Independent Verification

Every result is independently verifiable.

4 SHA-256 hashes per case. Perturbation test on every result. ATOL=0.05 on column-normalised fp64. 1,684/1,684 GPU PASS · 332/332 CPU PASS. Download the full validation kit with harness code, raw outputs, and reproduction instructions.

↓ Full Benchmark PDF
Contact

Contact Us

rolv@rolv.ai
Patent Pending ·
ROLV LLC · Fort Lauderdale, FL