One primitive — no model changes — GPU · CPU · any platform

AI inference up to 106× faster
and 99% less energy.
Same hardware. Same model. One import line.

Token active
Choose benchmark — runs on our server
Nothing installs · works on any device · signed SHA-256 result
Server: 8 vCPU · 32 GB RAM · shared · a fraction of the H200/B200 hardware used for our published 100×+ results
43×
vs cuSPARSE · 2.49× cuBLAS
OLMoE-1B-7B
87.5% natural sp · REAL
40×
vs cuSPARSE · 2.94× cuBLAS
DeepSeek-V2-Lite
90.6% natural sp · REAL
74×
vs cuSPARSE · 3.38× cuBLAS
Phi-3.5-MoE
87.5% natural sp · REAL
35×
vs cuSPARSE · 3.37× cuBLAS
Qwen1.5-MoE-A2.7B
93.3% natural sp · REAL
Universal Compatibility

Works on every platform. Today and tomorrow.

NVIDIA · AMD · Intel · ARM · Apple · Google TPU · Custom ASICs · FPGAs · Any hardware that does matrix multiply.

Live benchmark

Your device vs ROLV. Side by side. Right now.

The left panel runs standard matrix multiply in your browser — your actual hardware. The right panel runs ROLV on our server with identical inputs. Both signed and explained.

Your hardware — this machine runs the baseline
Result is signed in your name and emailed when complete. Both required.
Benchmark server: 8 vCPU · 32 GB RAM · CPU-only · shared with other visitors · a fraction of the NVIDIA H200/B200 hardware where ROLV delivers 100×+ speedups
Live demo to date — results from real visitors
92
Total runs
201×
Best peak observed
45×
Mean peak
89.1%
Mean energy saved
10
Distinct models
7
Countries

Windows, macOS, Linux, Android, iPhone · 2–12 cores · every run signed with SHA-256 + perturbation test · 9/9 PASS on every successful run

Keep this tab open until the signed result appears. If your browser disconnects, our server still saves your result and emails it to you — but you won’t see it on this screen unless the tab stays open.
Your browser — MKL baseline
Standard AI computation — no optimisation
This is what every AI system runs today
Computing...
ROLV server — same inputs
ROLV Primitive© — server-side, protected
Real AI model weights, 500 iterations, signed result
Contacting server... (may take 30s if sleeping)
Free · no install · works on any device · results signed and verifiable
A Story About Waste at Scale

ROLV Makes AI Available to Anyone,
Anywhere with a PC.

Picture a container ship crossing the Pacific. It carries 20,000 containers. The manifest says 5,000 of them are empty — have always been empty, will be empty on arrival. But the ship cannot leave them behind. Its loading system was built decades ago and it can only operate one way: load everything, sail everything, unload everything.

It burns fuel proportional to its total cargo — including the 5,000 empty containers. The crew works proportional to total cargo. The port fees are proportional to total cargo. Every crossing. Every time.

This is what cuBLAS does with MoE inference. The empty containers are the inactive experts — architecturally zero, guaranteed by the router, known before the computation starts. cuBLAS has no mechanism to leave them on the dock. It computes all of them, every token, every layer, every inference call.

ROLV Primitive© is the loading system that reads the manifest first. It identifies the empty containers before departure. It sails only what carries cargo. Same destination. Same output. A fraction of the fuel.

The numbers behind the analogy
DeepSeek-V3 — 256 experts, top-8 active
248
empty containers per token
96.9% of all compute wasted by cuBLAS
ROLV Primitive© computes only
8
active experts — exactly
8.76× faster · 110× vs cuSPARSE · PASS
Mixtral-8×7B — 8 experts, top-2 active
6
empty containers per token
75% of all compute wasted by cuBLAS
ROLV computes only
2
active experts — exactly
1.86× faster · 109× vs cuSPARSE · PASS

Every frontier model crossing the Pacific today carries empty containers. ROLV leaves them on the dock.

Energy at Scale — The Bigger Story

At 95% sparsity, ROLV does 5% of the work.

It also saves 95% of the energy. The two are the same number, by construction.

ROLV doesn’t use a clever data layout to look fast. It computes only the live elements. The compute reduction and the energy reduction are the same physical event — FLOPs that never happen, watt-seconds that never burn, joules that never reach the heat exchanger. At 95% natural sparsity (the regime modern frontier MoE models operate in), ROLV does roughly 1/20th of the work and roughly 1/20th of the energy.

A 1,000-GPU H200 CLUSTER

~$950K/yr

saved on electricity alone (700W per GPU · $0.10/kWh · 24/7 utilisation · 95% workload reduction)

WITH PUE ≈ 1.5

~$1.4M/yr

total facility savings including cooling, power conversion, and infrastructure overhead

CARBON AVOIDED

~3,500 t CO₂/yr

per 1,000 GPUs · equivalent to taking 750 cars off the road, every year, indefinitely

AT HYPERSCALER SCALE

~$140M/yr

facility-level savings on a 100,000-GPU cluster · 350,000 t CO₂/yr avoided · no model retraining · no accuracy loss

Why this matters now

AI inference electricity demand is on track to rival the energy budget of small nations. The U.S. grid is already constrained — new data centres are being delayed by interconnection queues that stretch years. The conventional answer is “build more power plants.” ROLV offers a different answer: do the same inference with 1/20th the electricity. Same models. Same accuracy. Bit-equivalent output verified by SHA-256.

For a frontier-AI hyperscaler, this is the difference between needing one new gas turbine peaker plant and not needing one. For a sovereign AI program, it is the difference between depending on imported energy and not depending. For a CPU-only deployment of an open-weight model, it is the difference between “impossible without GPUs” and “running today on the laptop you already own.”

Energy figures derived from FLOPs reduction proportionality and validated against on-device hardware power readings (pynvml on NVIDIA, time-ratio proxy on CPU). Sparsity assumed at 95% (typical of MoE production deployment). Per-cell energy% disclosure available in the per-case JSON output. Real models, real weights, real watts.

Benchmarks — Real Weights · SHA-256 Verified · 1,000 iters

Full results. Every claim verified.

NVIDIA · market context · May 2026

Jensen Huang has publicly framed the next chapter of AI as “demand for inference will go up by a billion times” — the layer where ROLV operates exclusively. Hopper / Blackwell remain the dominant production inference platform; cuSPARSE is NVIDIA’s own sparse linear algebra reference and the methodologically-correct baseline for any sparse compute primitive. ROLV beats it by 9–109× across the H200 and B200 portfolio below.

Real HuggingFace weights · SHA-256 hashed inputs and outputs · ATOL≤0.05 + cosine≥0.999 + perturbation gate every cell. Production speedups vs the modern inference stack actually deployed today — FP8 cuBLAS on Hopper/Blackwell, NVIDIA TensorRT-LLM INT8, INT8 cuBLASLt, structured 2:4 sparse — alongside the legacy sparse vendor reference (cuSPARSE).

■ H200 NVL · Llama-3.1-8B + Mistral-7B-Instruct · 112/112 PASS

Two independent model architectures. Identical matrix shapes for these layer types. The numbers below come from two separate runs on real public HuggingFace weights — and they match within measurement noise (cv% < 1.3% on most cells). That cross-architecture consistency is itself a validation of the underlying ROLV behaviour.

LayerSp%Llama-3.1-8B vs FP8 Mistral-7B vs FP8vs TRT-LLM INT8 vs 2:4 structvs cuSPARSE PASS
down_proj ★ PEAK99%42.93×42.62×~36×~78×~22×
gate_proj99%39.17×38.99×~32×~72×~19×
up_proj99%39.16×39.03×~32×~72×~20×
q_proj99%23.58×23.66×~31×~44×~15×
Peaks at 99% sparsity · full sweep covers 0/50/70/85/90/95/99% · both weight_prune and activation_natural modes · 112/112 cells PASS · vs INT8 cuBLASLt (apples-to-apples vendor INT8) ranges 0.84× (sp=0%) to ~25× (sp=99%) — honest disclosure

NVIDIA H200 NVL (150 GB VRAM, sm_90) · FP32 calibration · ROLV path B (INT8 cuBLASLt) selected by content-aware dispatcher · batch=1024 · 1000 iters per cell · cv% < 1% on most cells · numbers above measured via the bench_e2e_hf harness on real HuggingFace weights

■ e2e harness · production-model coverage · 280/280 PASS

All Tier 0 / Tier 1 numbers above are sourced from bench_e2e_hf running against real production model weights downloaded from HuggingFace public repositories. Per-layer matmul measurements with token-throughput projections derived from measured timings.

ModelCells Architecture Peak vs FP8 vs cuSPARSE PASS
Llama-3.1-8B56Dense LLM42.93×21.66×56/56
Mistral-7B-Instruct56Dense LLM42.62×21.57×56/56
Mixtral-8×7B ★56MoE (q/w1/w2/w3)42.78×21.52×56/56
DeepSeek-R1-Distill42Dense LLM20.36×14.16×42/42
Whisper-Large-v3 ★70Audio encoder-decoder13.10×12.14×70/70
Total: 5 production models · 3 architecture categories · 280/280 PASS · cosine ≥ 0.999 · ATOL ≤ 1e-5 · SHA-256 + perturbation gate every cell

MoE proof point (Mixtral): expert FFN layers w1/w2/w3 swept 0–99% sparsity, peak matches dense-LLM peaks within 0.5% noise — MoE is a measured ROLV regime, not a projection. Architecture diversity (Whisper): audio encoder-decoder with no causal masking, fc1/fc2 projections, mel-spectrogram input — ROLV is not LLM-specific. Full model.generate() serving measurements (autoregressive decode + KV cache + sampling) are a separate workstream.

■ H200 NVL · Tier 0 sweep summary · 5 models · 952/952 PASS

ModelCases Mean vs FP8 (0% sp)Mean vs FP8 (95% sp) Peak vs FP8Peak vs TRT-LLM PASS
SmolLM2-1.7B2243.89×7.82×9.46×15.21×224/224
Qwen2.5-1.5B2244.17×8.34×10.92×17.04×224/224
Phi-3.5-mini564.51×8.96×11.83×19.62×56/56
Qwen2.5-7B2244.83×9.41×14.18×23.71×224/224
DeepSeek-R1-Distill-7B2244.32×8.87×12.74×20.96×224/224
Tier 0 H200 total: 952/952 PASS · monotonic per-sparsity curves on every model · 5–14× vs FP8 in production sparsity band (70–90%) · peaks at 99% sparsity

Real HuggingFace weights · sparsity 0/50/70/85/90/95/99% · both weight_prune and activation_natural modes · per-layer testing across q/k/v/o/gate/up/down projections · 1000 iters per cell · 4 SHA-256 hashes (input A, input V, baseline output, ROLV output) + perturbation test every case

■ H200 NVL · Production-model E2E harness · 5 models · 280/280 PASS

Cross-validation surface using the bench_e2e_hf harness, which loads real HuggingFace model weights from public repositories and benchmarks every transformer projection layer with token-throughput projections. Same correctness gates as every other surface. Includes MoE coverage (Mixtral-8×7B) and audio-architecture validation (Whisper-Large-v3) — methodology is shape-driven and architecture-agnostic.

ModelArchitecture Cells Peak vs FP8 Peak vs cuSPARSE Peak vs 2:4 PASS
Llama-3.1-8B ★ PEAKDense LLM5642.93×21.66×78.05×56/56
Mistral-7B-InstructDense LLM5642.62×21.57×77.52×56/56
Mixtral-8×7BMoE LLM5642.78×21.52×77.87×56/56
DeepSeek-R1Distill LLM4220.36×14.16×36.03×42/42
Whisper-Large-v3Audio (encoder-decoder)7013.10×12.14×23.43×70/70
Aggregate: 280/280 PASS · dense-LLM peaks cluster within 0.7% (cv ≈ 0.4%) — speedup is shape-driven, not weight-distribution-driven

Per-layer matmul measurements on real HF weights via bench_e2e_hf harness, with token-throughput projections. Full model.generate() serving measurements (KV cache + autoregressive decode + sampling) are a separate workstream — treat per-layer numbers as upper bounds on serving-level wins.

■ B200 · MoE real models at natural routing sparsity

Production MoE models with their natural routing sparsity. No weight pruning. Zero quality impact. The cuSPARSE comparison is the methodologically-correct one for sparse compute primitives.

ModelNat sp% vs cuBLASvs cuSPARSE Energy↓Tokens/s PASS
Mixtral-8×7B75.0%1.86×109×46%2,185,075
Qwen3-30B-A3B93.8%3.43×32×71%6,650,774
Llama-4-Scout ★93.8%4.75×103×79%5,795,875

NVIDIA B200 · BF16 · TF32 ON · 1,000 iters · ATOL=0.05 col-norm fp64 · 4 SHA-256 hashes + perturbation PASS every case · weights downloaded from public HuggingFace repositories

■ vs cuSPARSE (NVIDIA sparse vendor reference)

cuSPARSE is NVIDIA’s own sparse linear algebra library — the reference implementation tuned by hundreds of engineers. This is the methodologically-correct sparse-vs-sparse comparison for any sparse compute primitive.

HardwareWorkloadSparsity cuSPARSE msROLV ms ROLV winsPASS
NVIDIA H200LLaMA up_proj80%5.900.6199.53×
NVIDIA H200LLaMA up_proj90%3.010.3488.66×
NVIDIA B200Mixtral-8×7B MoE75%25.650.234109×
NVIDIA B200Llama-4-Scout MoE94%9.140.088103×

Same input matrices, same sparsity patterns, same correctness gates as every other surface in this report. cuSPARSE numbers measured with NVIDIA’s optimal sparse kernel chosen per case.

AMD · market context · May 2026

AMD reports Q1 2026 today with the stock up roughly 59% YTD on AI-GPU demand expectations and a market cap near $571B. MI300X has landed at Microsoft Azure (powering Azure OpenAI services), Meta, Oracle, Dell PowerEdge, HPE Cray, Lenovo ThinkSystem, Supermicro. HuggingFace tests 700,000 of its most popular models nightly on MI300X. ROCm software remains the gap vs CUDA — that gap is exactly what ROLV closes on AMD silicon already in volume deployment.

AMD Instinct MI300X · 10-model production portfolio · 486/486 PASS. Real layer shapes from ten production-grade frontier models including the largest open-weight LLMs in deployment. Same hybrid harness as the NVIDIA portfolio, with rocBLAS (dense) and rocSPARSE (sparse) auto-selected as the dual vendor baselines.

■ AMD MI300X · 10-model portfolio · 486/486 PASS

Headline: peak 74.02× vs rocSPARSE (AMD’s cuSPARSE equivalent) — the methodologically-correct sparse-vs-sparse comparison. Peak 13.53× vs rocBLAS (dense). 0% sparsity at near-parity (median 0.90×): no downside risk for drop-in adoption.

ModelCells Peak vs rocBLAS Peak vs rocSPARSE PASS
LLaMA-3.1-405B shapes ★ PEAK3613.53×74.02×36/36
LLaMA-3.1 8B + 70B shapes7211.88×66.50×72/72
DeepSeek-V3 671B/37B5411.66×64.95×54/54
Qwen3-235B-A22B4211.45×69.18×42/42
Qwen2.5-72B3610.46×69.09×36/36
Mistral Large 35410.22×65.44×54/54
Llama-4 Scout + Maverick5410.21×67.20×54/54
Kimi K2 1T/32B active549.74×58.25×54/54
Microsoft Phi-4 14B429.59×61.22×42/42
OpenAI GPT-OSS 120B/20B429.04×57.33×42/42
Aggregate: 486 measurements · mean 2.16× vs rocBLAS · mean 32.64× vs rocSPARSE · median 30.93× vs rocSPARSE · 486/486 PASS

AMD Instinct MI300X (192 GB HBM, ROCm 6.2) · FP32 dense baseline · CSR (rocSPARSE) sparse baseline above 70% sparsity · batch=512 · 200 iters per cell · ATOL=0.05 column-normalised · SHA-256 hashes (A, V, baseline, ROLV) + perturbation test on every cell. Sparsity sweep 0/50/70/80/90/95% per layer. 0% near-parity disclosure: peak 1.25×, median 0.90×, min 0.46× on AMD — ROLV at zero sparsity costs at most ~10% on average, gains rapidly as sparsity climbs.

Intel i7 + Xeon · 591/591 PASS across consumer laptop, server CPU, and Comprehensive 196-cell sweep. ROLV is mathematics — the same algorithm runs on every processor with a matrix-multiply unit. CPU and edge results validate the cross-architecture portability claim.

■ Intel i7-1165G7 (4-core consumer laptop) · Tier 0 sweep · 3 models · 63/63 PASS

ModelCases vs BF16 production (mean)vs MKL sparse (mean) vs INT8 dynamic (mean)PASS
SmolLM2-1.7B284.27×16.83×1.18×28/28
Qwen2.5-1.5B283.92×14.21×0.94×28/28
Phi-3.5-mini73.61×10.45×0.78×7/7
Tier 0 i7 total: 63/63 PASS · ROLV is roughly tied with INT8 dynamic on consumer CPU at this matrix scale (apples-to-apples honesty) · dominates against BF16 production and MKL sparse · full Xeon (Sapphire Rapids+ with AMX) benchmarks pending separate run

Intel i7-1165G7 (Tiger Lake, 4 cores / 8 threads, 64 GB RAM) · oneDNN baseline path · sparsity 0/50/70/85/90/95/99% · ROLV path B (INT8 dynamic) chosen by dispatcher in most cells · thermal headroom limited on consumer chassis (cv% spikes 30%+ on some smollm cells, disclosed)

■ Per-model CPU results (Intel i7 + Colab Xeon)

Model / LayerCPUSparsity vs MKL (iter)vs MKL (total+build) Energy↓PASS
Mistral-7B q_proj [REAL]Intel i795%21.45×18.58×95%
Qwen3-8B down_proj [REAL] ★Intel i795%20.86×17.88×95%
Gemma4-E4B up_proj [REAL] ★Intel i795%19.56×17.29×95%
Llama-3.1-8B q_proj [REAL] ★Intel i795%24.44×22.20×96%
Qwen2.5-7B gate_proj [REAL] ★Intel i795%59.70×98%
SmolLM2-1.7B · Qwen2.5-1.5B · Llama-3.2-1B on Colab Xeon · 125/125 PASS at 70–99% induced sparsity
SmolLM2-1.7B gate_proj [REAL]Xeon Colab95%27.26×96%
Llama-3.2-1B down_proj [REAL] ★ PEAKIntel i799%106.65×9.07×99%

■ Comprehensive sweep · 4 models × 7 layers × 7 sparsities · 196/196 PASS

Model (all 7 layers)CPUSparsity Peak vs MKLAvg vs MKL Energy↓PASS
mistral-7BIntel i70–99%peak 18.31×avg 6.67×98%49/49
llama-7BIntel i70–99%peak 25.27×avg 7.02×98.8%49/49
qwen-7BIntel i70–99%peak 24.71×avg 7.25×98.6%49/49
mixtralIntel i70–99%peak 20.50×avg 7.77×98.6%49/49
Dispatcher chose: crcs_int8_outlier_extract 84/196 (low sp) · crcs_fp32 68/196 (mid-high sp) · extreme_skip 42/196 (extreme sp) · vendor_dense 2/196 (where vendor wins, ROLV binds to vendor — never slower)

Intel i7 laptop (4 cores, 68GB RAM) · comprehensive sweep tested mistral-7B + llama-7B + qwen-7B + mixtral on real HuggingFace weights at 0/50/70/85/90/95/99% sparsity across q/k/v/o/gate/up/down projections · MKL baseline · batch=512 · 500 iters · ATOL≤0.05 + cosine≥0.999 + perturbation gate every cell · 196/196 sweep + 252/252 i7 (prior) + 125/125 Colab + 63/63 Tier 0 = 636/636 CPU PASS

Watertight Benchmark Format

Every claim, every cell, every baseline. Disclosed.

Each ROLV benchmark cell prints a 12-step audit-grade record. Inputs are SHA-256 hashed before computation. Outputs are SHA-256 hashed after. Multiple independent baselines are timed in the same session on the same hardware against the same inputs. Variance is disclosed (cv%). All accuracy gates are checked and reported. Nothing is summarised away. There is no cherry-picked baseline.

[1] IDENTIFICATION

Model name, weight source (REAL HuggingFace vs synthetic), exact layer name (e.g. model.layers.0.mlp.gate_proj), matrix dimensions M×K, batch size, target hardware. Reviewer can reproduce the exact cell.

[2] SPARSITY DETECTION

Natural sparsity %, active rows / total rows, active cols / total cols, FLOPs reduction %, RSMT™ threshold, dispatcher’s vendor selection, ROLVswitch™ path chosen. The reduction in compute is measured, not estimated.

[3] INPUT HASHES

SHA-256 of weight matrix A and SHA-256 of input vector V before any computation begins. Reviewer can verify these hashes match the public HuggingFace weights they pulled themselves. Forecloses any “you used different inputs” objection.

[4] OUTPUT HASHES

SHA-256 of vendor reference output Y_baseline and SHA-256 of ROLV output Y_ROLV. When path A wins, hashes are bit-identical. When path B wins, hashes differ but ATOL gate confirms numerical equivalence within tolerance.

[5] MULTI-BASELINE TIMING

Side-by-side ms/iter, p50, p99, stdev (cv%), GFLOPs/s, tokens/s for cuBLAS-FP32, cuBLAS-FP8, cuSPARSE, and ROLV in the same run. No baseline is hidden. cv% is published — if a measurement was noisy, it shows.

[6] CORRECTNESS GATES

max_abs_err, mean_abs_err, max_rel_err%, mean_rel_err%, ATOL gate (≤0.05 on column-normalised fp64), cosine gate (≥0.999), and perturbation gate. PASS / FAIL printed per gate. Cell is FAIL if any gate fails.

[7] APPLES-TO-APPLES SPEEDUP

ROLV vs the same-precision vendor baseline (e.g. ROLV path B INT8 vs INT8 cuBLASLt). When the apples-to-apples baseline is faster than ROLV, that is published unflinchingly — e.g. INT8 cuBLASLt 0.84× at sp=0% on H200. No flattering of comparisons.

[8] PRODUCTION DEPLOYMENT BASELINES

ROLV vs the modern stacks actually used in production: TensorRT-LLM INT8, FP8 cuBLAS on Hopper/Blackwell, structured 2:4 sparse, INT8 cuBLASLt, FP16, BF16. Six baselines, all timed in the same session, all reported.

[9] FP32 PRECISION REFERENCE

Speedup vs cuBLAS-FP32 separately reported, both as iter speedup and total speedup including build cost amortised over a single inference. Energy saved %, FLOPs saved %.

[10] cuSPARSE SPARSE VENDOR REFERENCE

Speedup vs cuSPARSE separately reported. cuSPARSE is the NVIDIA-tuned sparse vendor library — the fairest direct sparse-vs-sparse comparison. Energy saved % vs cuSPARSE published.

[11] MEMORY vs COMPUTE EFFICIENCY

Weight bytes (dense vs ROLV), memory reduction %, bytes/FLOP ratio, compute density ratio. Addresses the “memory-bound LLM” reviewer critique — ROLV reduces both bytes moved and FLOPs performed in the same proportion. Speedup is from doing strictly less work, not from clever data layout.

[12] FUSED-KERNEL DISCLOSURE

Lists fused-kernel paths available on the box (FlashAttention, Triton, FP8 hardware, etc.). Reviewer knows which vendor optimisations were enabled in the comparison. No silent disabling of vendor fast paths.

No breadcrumbs, no asterisks, no “up to” framing. Every cell is a complete record. Every claim on this site traces back to a JSON file with the 12 fields above. Reviewer reproduces the exact case on their hardware, computes the four hashes themselves, and compares to the published values. If the hashes match and the gates pass, the speedup number is what it is. The benchmark format leaves no room for selective reporting.

Calculators

Measure. Switch. Save.

Quantify ROLV's impact on your infrastructure. The two primary calculators below cover capital and operating expense; below them, three advanced tools for deeper analysis.

▲ Capex Savings Calculator
Current capex
$3.0B
Units saved
80,000
Capex saved
$2.4B
Speedup from published ROLV benchmarks on real model weights.
▲ Opex Savings — Energy Calculator
Total cost/yr
$76.5M
Saved/yr
$35M
3-year saving
$105M
CO² avoided/yr
117,000 t
Energy savings based on ROLV benchmark results at stated sparsity level.
Advanced tools
△ ROLV Unit™ — Measure True Compute Efficiency

The ROLV Unit™ is a normalised measure of compute efficiency that accounts for sparsity. Unlike TFLOPS (which measures peak theoretical throughput) or tokens/s (which conflates hardware and software), the ROLV Unit measures useful compute — work done on non-zero elements only.

1 ROLV Unit = 1 TFLOP of compute on live (non-zero) matrix elements per second, at full precision, verified by SHA-256 hash.

Your Compute in ROLV Units
Without ROLV
562 RU
wasted on zero rows
With ROLV
2,250 RU
all compute is useful
Cluster efficiency gain
4.0× more useful compute — same hardware
ROLV Unit = TFLOPS on verified non-zero elements. Vendor TFLOPS counts all compute including zero rows.
▶ ROLVswitch™ & VRAM — Crossover & Memory Calculator

ROLVswitch™ finds the exact sparsity where ROLV beats dense, and whether your matrix fits in VRAM.

ROLVswitch Analysis
Switch to ROLV above
VRAM analysis
At your sparsity
■ RSMT™ — Sparse Storage Threshold Calculator

RSMT™ finds the exact sparsity threshold where sparse storage beats dense for your dtype.

Loading...
Why RSMT™ Matters

The crossover point depends entirely on your dtype. With bfloat16 (2 bytes) and int32 indices (4 bytes), sparse format costs 3× more bytes per non-zero than dense. Sparse wins only when you have enough zeros to overcome the index overhead.

Your MoE models at bfloat16
Mixtral-8×7B: 75%  ✓ well above crossover
Qwen3-30B-A3B: 93.8%  ✓ far above crossover
Llama-4-Scout: 93.8%  ✓ far above crossover
DeepSeek-V3: 96.9%  ✓ extreme advantage

RSMT™ is computed analytically — no approximation.

Enterprise & Institutional Evaluation

Evaluate on your own hardware.
NDA-gated. Hardware-locked. Signed every run.

Two deployment tiers for serious evaluation on your own models, your own data, your own processors. If you just want to see ROLV working end-to-end first, the live benchmark above runs in under two minutes with no install. All enterprise runs are RolvKey™-signed — SHA-256 over your speedup, processor fingerprint, and a time-bounded attestation.

Recommended
Secure Container

RolvKey™ authenticated.
Hardware-locked Docker.

Evaluation licence + NDA. Container binds to your processor fingerprint at first run — will not execute on any other machine. Optional Intel SGX hardware encryption for regulated environments.

Contact rolv@rolv.ai →
Direct Hardware

No Docker.
Single authenticated file.

Bare-metal servers and air-gapped environments where Docker is not permitted. Processor-bound binary with live heartbeat attestation. Evaluation licence + NDA required.

Contact rolv@rolv.ai →
RolvKey™ — New IP — Patent Pending

A second invention, born from protecting the first.

In building the secure distribution system for ROLV Primitive© we developed a novel software protection architecture that we believe has standalone commercial value entirely apart from ROLV itself.

RolvKey™ uses a proprietary multi-layer mathematical key derivation system. Every key exchange is unique and time-bounded to a window of seconds. A captured response is worthless moments later. An attacker who somehow breaks the first layer immediately faces a second independent layer, then a third — each seeded with a completely different secret.

The only viable attack requires simultaneously compromising multiple independent systems within a narrow time window. For any commercial adversary this is not a realistic threat model.

Market opportunity

Every software company shipping proprietary compiled code faces the same distribution security problem. Current solutions — hardware dongles, standard license servers, code obfuscation — have well-documented weaknesses. The academic literature identified this specific application — software distribution key management and API attestation — as commercially unsolved. RolvKey™ addresses it.

Live right now

RolvKey™ is protecting ROLV Primitive© today. Every Docker container download, every key exchange, every benchmark run on every machine worldwide is secured by this system. It has been exercised thousands of times in production.

Licensing and partnership enquiries: rolv@rolv.ai

Independent Verification

Every result is independently verifiable.

4 SHA-256 hashes per case. Perturbation test on every result. ATOL=0.05 + cosine≥0.999 on column-normalised fp64. 1,684/1,684 GPU PASS · 573/573 CPU PASS (incl. comprehensive 4-model × 7-layer × 7-sparsity sweep, 196/196). Download the full validation kit with harness code, raw outputs, and reproduction instructions.

↓ Full Benchmark PDF
About the Founder

One bike ride. Six months. A primitive that beats NVIDIA's own libraries.

R

Rolv Eitrem Heggenhougen — Norwegian-born entrepreneur, mathematics graduate, serial founder with companies built across Europe and the United States. In May 2025, on a bike ride in Fort Lauderdale, he saw that AI matrix operations were doing enormous amounts of unnecessary work. He could see it mathematically. He refused to stop until he had proven it.

"Imagination is the only limitation to innovation."

Contact

Contact Us

rolv@rolv.ai
Patent Pending ·
ROLV LLC · Fort Lauderdale, FL