Software-Only · No Hardware Changes · No Model Retraining · 3 Patents Pending

Cut AI infrastructure costs —
capex and opex — from a single software primitive.

Drop-in replacement for cuBLAS and cuSPARSE. Works on every GPU, CPU, and accelerator. Zero pruning. Zero model changes. Zero retraining.

+86–+854%
Faster than cuBLAS (1.86–9.54×)
GPU · MoE real weights · 16 MoE models confirmed real weights
2–28×
Faster than MKL
CPU · laptop · no GPU required
75–99%
Energy reduction
Fewer joules per token · pynvml verified
$0
Hardware investment
Software only · deploy today
See the business case ↓ View benchmarks ↓ ↓ Validation kit
The Business Case

How many GPUs can you not buy?

▲ Capex Savings
A hyperscaler buys 100,000 GPUs at $30K each = $3.0B capex.
At ROLV’s conservative 3× speedup, you need 33,333 GPUs to do the same work.
At (Llama-4-Scout class), you need just 20,000.
Saved at 3×
$2.0B
66,667 fewer GPUs
Saved at 5×
$2.4B
80,000 fewer GPUs
$30K/GPU conservative · H200 ~$30-40K · B200 ~$40K+
▲ Opex Savings — Energy
100,000 H200s at 700W, 80% utilisation, PUE 1.3, $0.12/kWh
costs $76.5M/year in electricity alone — before cooling overhead.
ROLV reduces active compute by 46–99% depending on model.
Saved/yr (46% — Mixtral)
$35M
117,000 t CO² avoided
Saved/yr (88% — DeepSeek)
$67M
225,000 t CO² avoided
100K H200 · 700W · 80% util · PUE 1.3 · $0.12/kWh

ROLV Primitive© replaces cuBLAS and cuSPARSE — NVIDIA’s own compute libraries — with a fundamentally better approach for sparse AI workloads. On DeepSeek-V3 and DeepSeek-R1 (real weights, H200), ROLV is 7.15× faster than cuBLAS and 53× faster than cuSPARSE†. On Kimi-K2-Instruct (real weights, H200): 8.74× faster than cuBLAS, 89% energy saved. On NVIDIA hardware. SHA-256 verified, perturbation PASS every case.

Technical Foundation

One operator. Exact output.
Proportionally fewer multiplications.

ROLV Primitive© is a drop-in replacement for cuBLAS and cuSPARSE that exploits the natural zero structure of AI weight matrices. No approximation. No accuracy cost. Deterministic on every platform.

MoE Natural Sparsity — Real Model Results

Mixtral. Qwen3. Llama-4. DeepSeek. Jamba. All PASS.
Real weights. Zero pruning. Independently verified.

MoE routers zero out 75–99% of expert weights per token — architecturally, exactly. cuBLAS computes them all. ROLV doesn’t. The speedup is proportional and provable.

1.86×
Mixtral-8×7B
75% natural sp · REAL · −47% energy
3.43×
Qwen3-30B-A3B
93.8% natural sp · REAL · −71% energy
8.74×
Llama-4-Scout ★
93.8% natural sp · REAL · −79% energy
4.47×
Gemma4-26B-A4B
97.9% natural sp · REAL · H200 · −89% energy
43×
vs cuSPARSE  ·  2.49× vs cuBLAS
OLMoE-1B-7B
87.5% natural sp · REAL · −59.9% energy
40×
vs cuSPARSE  ·  2.94× vs cuBLAS
DeepSeek-V2-Lite
90.6% natural sp · REAL · −66.0% energy
74×
vs cuSPARSE  ·  3.38× vs cuBLAS
Phi-3.5-MoE
87.5% natural sp · REAL · −70.4% energy
35×
vs cuSPARSE  ·  3.37× vs cuBLAS
Qwen1.5-MoE-A2.7B
93.3% natural sp · REAL · −70.3% energy
Universal Compatibility

Works on every platform. Today and tomorrow.

NVIDIA · AMD · Intel · ARM · Apple · Google TPU · Custom ASICs · FPGAs · Photonic · Quantum · Any hardware that does matrix multiply.

A Story About Waste at Scale

ROLV Makes AI Available to Anyone,
Anywhere with a PC.

Picture a container ship crossing the Pacific. It carries 20,000 containers. The manifest says 5,000 of them are empty — have always been empty, will be empty on arrival. But the ship cannot leave them behind. Its loading system was built decades ago and it can only operate one way: load everything, sail everything, unload everything.

It burns fuel proportional to its total cargo — including the 5,000 empty containers. The crew works proportional to total cargo. The port fees are proportional to total cargo. Every crossing. Every time.

This is what cuBLAS does with MoE inference. The empty containers are the inactive experts — architecturally zero, guaranteed by the router, known before the computation starts. cuBLAS has no mechanism to leave them on the dock. It computes all of them, every token, every layer, every inference call.

ROLV Primitive© is the loading system that reads the manifest first. It identifies the empty containers before departure. It sails only what carries cargo. Same destination. Same output. A fraction of the fuel.

The numbers behind the analogy
DeepSeek-V3 — 256 experts, top-8 active
248
empty containers per token
96.9% of all compute wasted by cuBLAS
ROLV Primitive© computes only
8
active experts — exactly
8.74× faster · 53× vs cuSPARSE† · PASS
Mixtral-8×7B — 8 experts, top-2 active
6
empty containers per token
75% of all compute wasted by cuBLAS
ROLV computes only
2
active experts — exactly
1.86× faster · 109× vs cuSPARSE · PASS

Every frontier model crossing the Pacific today carries empty containers. ROLV leaves them on the dock.

Benchmarks — Real Weights · SHA-256 Verified · 1,000 iters

Full results. Every claim verified.

ModelSrcNat sp% vs cuBLASvs cuSPARSE Energy%Tokens/sPASS
Snowflake-Arctic ★★ ★★synth98.4%9.54×36×91%3,919,474
Llama-4-Mavericksynth99.2%9.32×16׆91%667,899
Kimi-K2-Instruct ★ ★REAL97.9%8.74×43׆89%597,568
Kimi-K2.5synth97.9%8.59×43׆88%587,180
DeepSeek-V3-0324REAL96.9%7.15×53׆85%733,410
DeepSeek-V3REAL96.9%7.15×53׆85%734,848
DeepSeek-R1REAL96.9%7.15×53׆85%733,962
Qwen3-235B-A22Bsynth93.8%4.35×65×75%893,012
Llama-4-Scout ★REAL93.8%4.75×103×79%5,795,875
Gemma4-26B-A4BREAL93.8%4.47×53×78%2,398,905
Qwen2-57B-A14BREAL87.5%4.40×90×77%2,357,882
Qwen3-30B-A3BREAL93.8%3.43×32×71%6,650,774
Phi-3.5-MoEREAL87.5%3.38×74×70%2,430,602
Qwen1.5-MoE-A2.7BREAL93.3%3.37×35×70%4,834,346
DeepSeek-V2-LiteREAL90.6%2.94×40×66%3,959,777
OLMoE-1B-7BREAL87.5%2.49×43×60%4,580,013
Mixtral-8×7BREAL75.0%1.86×109×46%2,185,075
Mixtral-8×22BREAL75.0%1.36×76×27%646,556
MiniMax-M2.5 — custom architecture · full matrix · cuSPARSE CAN run · ROLV wins 25×
MiniMax-M2.5 ★REAL ✓96.9%3.95×25×77%1,314,909
DBRXsynth75.0%1.31×73×23%473,230

H200 + B200 · BF16 · TF32 ON · 1,000 iters · ATOL=0.05 col-norm fp64 · 4 SHA-256 hashes + perturbation PASS every case · ★ peak REAL · ★★ peak overall · †cuSPARSE active submatrix (INT_MAX exceeded; ROLV handles full matrix)

Model / LayerGPUSparsity vs cuBLASvs vendor sparsePASS
LLaMA-3.1-8B up_proj [REAL]H20080%2.17×9.53×
LLaMA-3.1-8B up_proj [REAL]H20090%2.79×8.66×
DeepSeek-R1 embed [REAL]B20095%19.42×19.42×
10k×10k syntheticB20070%3.11×12.06×
10k×10k syntheticMI300X85%8.5×83.77×
Tesla T4 syntheticT490%5.8×14.2×

1,684/1,684 total PASS across all GPU benchmarks · BF16 · TF32 ON · ATOL=0.05 · AMD MI300X: rocBLAS 8.5× (rocSPARSE has known regression at high sparsity)

Model / LayerCPUSparsity vs MKL (iter)vs MKL (total+build) Energy↓PASSPert
Mistral-7B q_proj [REAL]Intel i795%21.45×18.58×95%
Mistral-7B up_proj [REAL]Intel i795%17.98×15.73×94%
Mistral-7B down_proj [REAL]Intel i795%18.86×16.32×95%
Mistral-7B v_proj [REAL]Intel i795%20.12×18.32×95%
Mistral-7B gate_proj [REAL]Intel i795%15.70×13.90×94%
Mistral-7B k_proj [REAL]Intel i795%17.02×15.57×94%
Mistral-7B o_proj [REAL]Intel i795%14.24×12.59×93%
Mistral-7B avg · 7 layer types · 70–95% sparsity · 28/28 PASS
Mistral-7B avg all layers [REAL]Intel i770–95%8.49×83%28/2828/28
Qwen3-8B — peak results at 95% sparsity
Qwen3-8B down_proj [REAL] ★Intel i795%20.86×17.88×95%
Qwen3-8B q_proj [REAL]Intel i795%19.38×16.61×95%
Qwen3-8B gate_proj [REAL]Intel i795%18.05×15.14×95%
Qwen3-8B avg · 7 layer types · 70–95% sparsity · 28/28 PASS
Qwen3-8B avg all layers [REAL]Intel i770–95%8.59×84%28/2828/28
Gemma4-E4B (Google) — peak results at 95% sparsity
Gemma4-E4B up_proj [REAL] ★Intel i795%19.56×17.29×95%
Gemma4-E4B o_proj [REAL]Intel i795%17.58×15.98×94%
Gemma4-E4B gate_proj [REAL]Intel i795%16.07×14.65×94%
Gemma4-E4B avg · 7 layer types · 70–95% sparsity · 28/28 PASS
Gemma4-E4B avg all layers [REAL]Intel i770–95%7.20×81%28/2828/28
Phi-4 (Microsoft) — 2 layer types · 8/8 PASS
Phi-4 down_proj [REAL] ★Intel i795%14.82×13.23×93%
Phi-4 o_proj [REAL]Intel i795%13.32×11.62×93%
DeepSeek-R1-7B — peak results at 95% sparsity · 28/28 PASS
DeepSeek-R1-7B down_proj [REAL] ★Intel i795%17.22×15.22×94%
DeepSeek-R1-7B gate_proj [REAL]Intel i790%13.48×11.82×93%
DeepSeek-R1-7B avg all layers [REAL]Intel i770–95%7.0×85%28/2828/28
Qwen2.5-7B (Alibaba) — peak results at 95% sparsity · 28/28 PASS
Qwen2.5-7B down_proj [REAL] ★Intel i795%17.40×15.54×94%
Qwen2.5-7B q_proj [REAL]Intel i795%16.44×14.74×94%
Qwen2.5-7B avg all layers [REAL]Intel i770–95%7.0×83%28/2828/28
Llama-3.2-3B + Llama-3.1-8B (Meta) · 56/56 PASS
Llama-3.2-3B down_proj [REAL] ★Intel i795%18.07×15.60×94%
Llama-3.1-8B down_proj [REAL]Intel i795%15.08×13.25×93%
Llama-3.2-3B avg / Llama-3.1-8B avg [REAL]Intel i770–95%7.4× / 7.5×83%56/5656/56
Gemma-2-2B (Google) — peak results at 95% sparsity · 28/28 PASS
Gemma-2-2B down_proj [REAL] ★Intel i795%18.71×16.81×95%
Gemma-2-2B gate_proj [REAL]Intel i795%15.95×14.35×94%
Gemma-2-2B avg all layers [REAL]Intel i770–95%7.0×85%28/2828/28
TOTAL CPU: 9 models · 332/332 PASS · Avg 7.37× · Peak 24.27×
Phi-4 (Microsoft) — peak at 95%
Phi-4 o_proj [REAL] ★Intel i795%19.44×17.04×95%
Phi-4 down_proj [REAL]Intel i795%16.12×13.93×94%
DeepSeek-R1-Distill-7B — peak at 95%
DeepSeek-R1-7B q_proj [REAL] ★Intel i795%21.11×18.68×95%
DeepSeek-R1-7B down_proj [REAL]Intel i795%20.41×17.65×95%
Qwen2.5-7B — peak at 95%
Qwen2.5-7B gate_proj [REAL] ★Intel i795%59.70×98%
Qwen2.5-7B down_proj [REAL]Intel i795%21.26×95%
Llama-3.2-3B (Meta) — peak at 95%
Llama-3.2-3B up_proj [REAL] ★Intel i795%19.83×95%
Llama-3.2-3B gate_proj [REAL]Intel i795%17.23×94%
Llama-3.1-8B (Meta) — peak at 95%
Llama-3.1-8B q_proj [REAL] ★Intel i795%24.44×22.20×96%
Llama-3.1-8B down_proj [REAL]Intel i795%19.29×18.12×95%
Llama-3.1-8B avg · 7 layer types · 70–95% sparsity · 28/28 PASS
Llama-3.1-8B avg all layers [REAL]Intel i770–95%8.26×84%28/2828/28
Gemma-2-2B (Google) — peak at 95%
Gemma-2-2B up_proj [REAL] ★Intel i795%20.07×20.31×95%
Gemma-2-2B down_proj [REAL]Intel i795%18.67×16.95×95%
Gemma-2-2B avg · 7 layer types · 70–95% sparsity · 28/28 PASS
Gemma-2-2B avg all layers [REAL]Intel i770–95%8.48×83%28/2828/28
▶ Google Colab Intel Xeon @ 2.20GHz · 4 cores · 54.8GB RAM · FP32 · rolvprimitive wheel — 105/105 PASS · 5 sparsity levels (70–99%) · 3 models · 7 layers each
Llama-3.1-8B (Meta) — ★★ CPU peak 77.38× (o_proj, 99%) — 35/35 PASS · exact FP32 all cases
Llama-3.1-8B o_proj [REAL] ★★Xeon Colab99%77.38×98.7%
Llama-3.1-8B gate_proj [REAL]Xeon Colab90%10.67×90.6%
Llama-3.1-8B down_proj [REAL]Xeon Colab95%22.36×95.5%
Llama-3.1-8B avg all layers [REAL]Xeon Colab70–99%~8–77×99%35/3535/35
Qwen3-8B (Alibaba) — ★ CPU peak 73.22× (up_proj, 99%) — 35/35 PASS · exact FP32 all cases
Qwen3-8B up_proj [REAL] ★Xeon Colab99%73.22×98.6%
Qwen3-8B gate_proj [REAL]Xeon Colab90%10.83×90.8%
Qwen3-8B down_proj [REAL]Xeon Colab95%20.20×95.1%
Qwen3-8B avg all layers [REAL]Xeon Colab70–99%~7–73×99%35/3535/35
Qwen2.5-7B (Alibaba) — peak 64.21× (down_proj, 99%) — 35/35 PASS
Qwen2.5-7B down_proj [REAL] ★Xeon Colab99%64.21×98.4%
Qwen2.5-7B gate_proj [REAL]Xeon Colab95%17.80×94.4%
Qwen2.5-7B o_proj [REAL]Xeon Colab90%11.03×90.9%
Qwen2.5-7B avg all layers [REAL]Xeon Colab70–99%~4–64×99%35/3535/35
▶ Google Colab Intel Xeon @ 2.20GHz · 2 cores · 13GB RAM · FP32 · rolvprimitive wheel — 125/125 PASS · 5 sparsity levels (70–99%) · smaller models
SmolLM2-1.7B (HuggingFace) ★ — peak 27.26×
SmolLM2-1.7B gate_proj [REAL] ★Xeon Colab95%27.26×96%
SmolLM2-1.7B up_proj [REAL]Xeon Colab95%24.29×96%
SmolLM2-1.7B avg all layers [REAL]Xeon Colab70–95%8.67×79%20/2020/20
Qwen2.5-1.5B (Alibaba) — peak 27.61×
Qwen2.5-1.5B up_proj [REAL] ★Xeon Colab95%27.61×96%
Qwen2.5-1.5B gate_proj [REAL]Xeon Colab95%17.04×94%
Qwen2.5-1.5B avg all layers [REAL]Xeon Colab70–95%6.70×76%20/2020/20
Llama-3.2-1B (Meta) — peak 25.97×
Llama-3.2-1B up_proj [REAL] ★Xeon Colab95%25.97×95%
Llama-3.2-1B avg all layers [REAL]Xeon Colab70–95%7.15×78%20/2020/20
Gemma-2-2B on Colab Xeon — peak 28.62× (confirms i7 results on different CPU)
Gemma-2-2B gate_proj [REAL] ★Xeon Colab95%28.62×95%
Gemma-2-2B avg all layers [REAL]Xeon Colab70–95%7.09×78%20/2020/20
Llama-3.2-3B on Colab Xeon — peak 27.16×
Llama-3.2-3B up_proj [REAL] ★Xeon Colab95%27.16×96%
Llama-3.2-3B gate_proj [REAL]Xeon Colab95%16.99×94%
Llama-3.2-3B avg all layers [REAL]Xeon Colab70–95%8.09×81%20/2020/20
i7 combined total: 252/252 PASS · Meta · Alibaba · Google · Microsoft · DeepSeek · same Intel i7 laptop
AMD EPYC 7B13 syntheticEPYC90%8.5×89%

Intel i7 laptop (4 cores, 68GB RAM) · Mistral-7B + Qwen3-8B + Gemma4-E4B + Phi-4 + DeepSeek-R1-7B + Qwen2.5-7B + Llama-3.2-3B + Llama-3.1-8B + Gemma-2-2B real HuggingFace weights · MKL baseline · Speedup includes ROLV build time · 252/252 PASS (i7) + 230/230 PASS (Colab Xeon wheel, 5-level) = 482/482 total · 482/482 perturbation PASS · CPU peak 77.38× · 1,000 iters · ATOL=0.05

⚠ cuSPARSE INT_MAX CEILING — REAL-WORLD CONSTRAINT
cuSPARSE uses 32-bit signed integers for matrix dimensions. INT_MAX = 2,147,483,647. At the scale of modern MoE models, the full stacked expert matrix exceeds this limit: DeepSeek-V3/R1 (3.76B elements), Kimi-K2 (5.64B), Llama-4-Maverick (5.37B), GigaChat3.1 (3.76B) — cuSPARSE cannot run the full matrix for these models. cuSPARSE comparisons marked † are submatrix only (active rows). ROLV handles the full matrix natively in all cases.
MiniMax-M2.5: 1.21B elements — cuSPARSE CAN run full matrix. ROLV still wins by 25×. This is the most conservative possible comparison for ROLV. ROLV is the correct deployment choice above 90% sparsity regardless.
ROLV vs cuSPARSE AT 95%+ SPARSITY — REAL VERIFIED RESULTS
DeepSeek-V3/R1
53× (submatrix†)
Kimi-K2-Instruct
43× (submatrix†)
MiniMax-M2.5
25× (full matrix ✓)
Qwen3-235B
65× (full matrix ✓)
Llama-4-Scout
103× (submatrix†)
Llama-4-Maverick
16× (submatrix†)
† INT_MAX exceeded — cuSPARSE active submatrix only. ROLV runs full matrix. All H100/H200, REAL weights, SHA-256 verified.
HardwareMatrixSparsity cuSPARSE msROLV ms ROLV winsPASS
NVIDIA H200LLaMA up_proj80%5.900.6199.53×
NVIDIA H200LLaMA up_proj90%3.010.3488.66×
NVIDIA B200Mixtral-8×7B MoE75%25.650.234109×
NVIDIA B200Llama-4-Scout MoE94%9.140.088103×
NVIDIA B20010k×10k synthetic70%4.310.3612.06×
AMD MI300X10k×10k synthetic85%74.270.8983.77×
Intel i7 CPUMistral-7B q_proj95%66.43.1814.01×

cuSPARSE is NVIDIA’s own sparse library — tuned by hundreds of engineers. ROLV beats it everywhere because dense matmul on a small submatrix outperforms CSR index lookups for LLM weight patterns. AMD MI300X uses rocSPARSE which has a known performance regression at high sparsity — rocBLAS 8.5× comparison also published.

Calculators

ROLV-SCA™

Sparse Compute Advisor — three integrated calculators: cuSPARSE INT_MAX ceiling, path selector, and speedup vs the correct vendor baseline for your sparsity level.

ROLV-SCA™
Sparse Compute Advisor  ·  3 integrated calculators  ·  ROLV Primitive™
rolv.ai
Model configuration
Natural sparsity
Stacked matrix M × K
Total elements
vs INT_MAX
ROLV Primitive™ path
Calc 1 cuSPARSE INT_MAX
Matrix M × K
Total elements
INT_MAX2,147,483,647
Ratio to INT_MAX
VRAM BF16
Calc 2 path selector
Natural sparsity
0% 50% 66.7% 85% 100%
Break-even FP32+int3250.0%
Break-even FP32+int6466.7%
cuSPARSE typical switch≥ 85–90%
Active rows gate/up
Active rows down
Compression
Calc 3 speedup vs baselines
vs vendor dense — cuBLAS
vs vendor sparse — cuSPARSE
Why this baseline
Independent Verification

Every result is independently verifiable.

4 SHA-256 hashes per case. Perturbation test on every result. ATOL=0.05 on column-normalised fp64. 1,684/1,684 GPU PASS · 332/332 CPU PASS. Download the full validation kit with harness code, raw outputs, and reproduction instructions.

↓ Download Validation Kit ↓ Full Benchmark PDF 📄 Read the Paper
DOI 10.5281/zenodo.19221455 Published · Zenodo & Academia.edu · CC BY 4.0
R
Founder & CEO

Rolv Eitrem Heggenhougen

Born in Norway. Built companies across Europe and the United States. In May 2025, during a bike ride in Fort Lauderdale, he asked whether AI matrix operations could be made dramatically faster — and refused to stop until they were. Six months later, ROLV Primitive© was independently validated by the University of Miami. Three patents pending.

“Imagination is the only limitation to innovation.”

Read the full story →
Contact
Get in touch
Licensing  ·  Benchmarking  ·  Research partnerships
rolv@rolv.ai
ROLV LLC  ·  445 NE 12th Ave  ·  Fort Lauderdale FL 33301
rolv@rolv.ai 3 Patents Pending  ·  ROLV LLC  ·  Fort Lauderdale FL 33301