Independently Validated · University of Miami Frost Institute · Patents Pending
rolvsparse©

Cut AI energy costs by 99.9%.
rolv speeds up every AI chip — no new hardware.

rolvsparse© is a new compute primitive that restructures how every AI processor handles matrix arithmetic — delivering up to 133.5× real-world speedup on Llama-4 Maverick and 99.9% energy reduction. Real weights. Every platform. No hardware changes. No model retraining.

133.5×
Peak Real-World
Llama-4 Maverick · real weights · B200
83×
Production Serving
Claude 3.5-class · B=512 · B200
99.9%
Peak Energy Saved
Llama-4 Maverick · real weights
100.9×
Faster TTFT
Llama-4 400B · NVIDIA B200
5
Hardware Platforms
One library · one hash
Time-To-First-Token
100.9× faster TTFT
Llama-4 400B — TTFT speedup 100.9× on NVIDIA B200 with real weights. Llama-4 Maverick: 52.1× TTFT. GPT-4o class (B=512): 40.1× TTFT. Users experience near-instantaneous first-token response.
Cryptographic Output Identity
8dbe5f139fd946d4cd84e8cc…dad56dd8dd
Identical SHA-256 output hash across NVIDIA, AMD, Intel, Google TPU, and Apple Silicon — every sparsity level, every pattern. Cryptographically verified correctness.
University of Miami↓ Validation Letter ↓ Validation Kit v2.0
01 — Throughput

Up to 133.5× faster on real-world frontier LLMs.

On NVIDIA B200, real Llama-4 Maverick MoE expert FFN weights deliver a 133.5× throughput gain — with 99.9% energy saved and 52.1× TTFT speedup. Llama-4 400B hits 125.3× speedup and 100.9× TTFT. DeepSeek-R1 delivers 44.2×. Output hash-verified and canonical-checked.

NVIDIA B200 · PyTorch 2.8.0+cu128 · CUDA 12.8 · Batch 512 · 1,000 iterations

Llama 4 Maverick — MoE Expert FFN Real weights · HuggingFace

up_proj · model-00001-of-00084.safetensors · 16384 × 5120 · bfloat16

cuBLAS
369k
rolvsparse©
7.66M
20.7×
Throughput
177×
TTFT Speedup
81.5%
Energy Saved
1,285
Eff. TFLOPS
Energy: 42.97 J (rolv) vs 232.32 J · TTFT: 0.000365 s vs 0.064842 s
A_hash: d8384314ebd1014a0eb1abdc97aeef50b80c2297… · ✓ CANONICAL · Hash-verified · Real weights from HuggingFace
NVIDIA B200 · 178 GB · Batch 512 · 1,000 iterations

Qwen2.5-72B-Instruct — MoE Expert FFN

72B params · Mixture-of-Experts · 8,192 × 28,672

cuBLAS
127k
rolvsparse©
6.42M
50.5×
Throughput
50.5×
Per-Iter
91.4%
Energy Saved
3,018
Eff. TFLOPS
Energy: 64.02 J (rolv) vs 741.70 J · Per-iter: 0.000080 s vs 0.004027 s
✓ Output hash verified · Deterministic · Reproducible across all platforms
NVIDIA B200 · PyTorch 2.8.0+cu128 · CUDA 12.8 · Batch 512 · 200 iterations

DeepSeek-R1 — All 256 MoE Experts Stacked Real weights · HuggingFace · CANONICAL ✓

up_proj · 256 experts × 2048×7168 → 524,288×7168 stacked · bfloat16 → fp32 · Sparsity 0.006% · Build time 0.11 s

cuBLAS
8.9k
rolvsparse©
704.4k
78.9×
Throughput
41.6×
TTFT Speedup
98.7%
Energy Saved
5,294
Eff. TFLOPS
Energy: 106.90 J (rolv) vs 8,430.24 J · TTFT: 0.00140 s vs 0.05806 s
A_hash: 31575ec5d58089784332d7e1… · 4 shards · layers.3.mlp.experts.0–255 · ✓ CANONICAL · Hash-verified · Real weights from HuggingFace
133.5×
Llama-4 Maverick
Real weights from HuggingFace · NVIDIA B200 · 99.9% energy saved · 52.1× TTFT speedup.
125.3×
Llama-4 400B
Real weights · NVIDIA B200 · 99.4% energy saved · 100.9× TTFT speedup · 852K tokens/s.
44.2×
DeepSeek-R1
256 MoE experts · real weights · NVIDIA B200 · 98.7% energy saved · 41.6× TTFT.
83×
Claude 3.5 Serving
Architecture-matched · B=512 · NVIDIA B200 · 98.8% energy saved · 56.3× TTFT.
01b — Production Serving Benchmark

GPT-4o & Claude 3.5-class. The models running 80% of all API traffic.

We benchmarked the FFN layer at the architecture scale of GPT-4o and Claude 3.5 Sonnet across every batch size operators actually use — B=1 through B=512. The speedup increases as concurrency grows. At B=512 — where cuBLAS is fully optimised — ROLV delivers 68.7× (GPT-4o class) and 83× (Claude 3.5 class). Architecture-matched dimensions, synthetic fp32 weights (standard methodology for closed models). NVIDIA B200.

Batch Serving context GPT-4o Class
speedup vs cuBLAS
Claude 3.5 Class
speedup vs cuBLAS
GPT-4o p99 (ms) Claude 3.5 p99 (ms) Energy saved
1Single user · SLA-critical23.6×36.3×0.0610.06695–97%
4Small burst33.0×59.7×0.0570.05397–98%
16Enterprise API31.1×61.2×0.0740.07797–98%
64High concurrency38.8×59.3×0.0750.08897–98%
128Heavy serving52.1×68.7×0.1000.13498%
256Datacenter batch60.5×77.5×0.1510.20298–99%
512Max throughput — cuBLAS comfort zone68.7×83.0×0.2520.36098.5–98.8%

GPT-4o class: 8 experts × (18,432×7,168) = 147,456×7,168. Claude 3.5 class: 8 experts × (28,672×8,192) = 229,376×8,192. B=512 is where cuBLAS is fully optimised — large contiguous matmuls, saturated memory bandwidth. cuBLAS p99 at B=512: 16.6 ms (GPT-4o), 29.0 ms (Claude 3.5). ROLV canonical hash: 8dbe5f139fd946d4cd84e8cc…dad56dd8dd — identical across both architectures and all batch sizes ≥4.

83×
Peak serving speedup
Claude 3.5-class at B=512. Speedup grows with batch — ROLV scales better than cuBLAS under load.
98.8%
Energy saved
0.52 mJ/token vs 43.57 mJ/token at B=512. At 1B tokens/day: 12 kWh → 0.14 kWh per layer.
5,515
Eff. TFLOPS
Claude 3.5-class at B=512. cuBLAS baseline: 66 TFLOPS. Build time: 54 ms, amortised. † Nominal dense FLOPs ÷ ROLV time — values above hardware peak reflect work reduction.
02 — Energy Efficiency

91–99% less energy. Same hardware. Same outputs.

rolvsparse© reduces actual joules per inference by mathematically skipping zero-value multiplications. On Llama 4 Maverick, energy drops from 786 J to 50.6 J per 1,000 iterations — a 93.6% reduction — with identical outputs.

Infrastructure Economics

For a hyperscaler with 100,000 GPUs and $10B annual energy spend, rolvsparse©'s 65–99% savings translates to $6.5B–$9.9B annually. Hardware capex savings from needing fewer GPUs add a further $4B–$10B per year at $20B spend.

Model / Workload Hardware Speedup Energy Saved Tokens/sec (rolv) TTFT Speedup
Llama-4 Maverick 400B real weightsNVIDIA B200133.5×99.9%149,51452.1×
Llama-4 400B real weightsNVIDIA B200125.3×99.4%852,680100.9×
DeepSeek-R1 · 256 MoE experts real weightsNVIDIA B20044.2×98.7%704,36341.6×
Mixtral-8×22B (56 layers)NVIDIA B20035.1×98.2%2,266,37428.2×
Llama-3 70B FFN real weightsNVIDIA B20050.5×98.0%7,179,519
Mistral-7B WandaNVIDIA B20039.1×97.4%
Mistral-7B WandaAMD MI300X15.8×93.7%
Qwen2.5-72B MoE FFNNVIDIA B20050.5×91.4%6,740,529
Kimi K2.5 (~1T MoE)NVIDIA B20010.5×90.6%490,92929.7×
Qwen2.5-32B FFN real HF weights · 27,648×5,120Google TPU v5e-85.9×83.0%3,924,124
Qwen3.5-35B-A3B-GPTQ-Int4 · 64 experts stacked real HF weights · GPTQ-Int4NVIDIA B2009.4×89.3%127,076,958
GLM-OCR · 24 layers stacked real HF weights · #1 OmniDocBench · 0.9BNVIDIA B20050.0×98.0%318,848,172
BERT-Base Real FFN real HF weights · 0% sparsityIntel Xeon12.3×91.8%103,801
GPT-4o Class · B=512 syntheticNVIDIA B20068.7×98.5%2,125,99440.1×
Claude 3.5 Sonnet Class · B=512 syntheticNVIDIA B20083.0×98.8%1,467,58456.3×
03 — All Hardware Platforms

One library. Every chip. CPU beats flagship GPU.

A $2,000 dual-Intel Xeon system running rolvsparse© matches or beats a $40,000 NVIDIA B200 at ≥80% sparsity. AMD MI300X achieves 242× sparse speedup. AMD EPYC 7B13 CPU achieves 117× at 90% sparsity. This is a structural break in AI infrastructure economics. Intel benchmarks were run on 4k×4k matrices; NVIDIA on 20k×20k (25× larger) — making the comparison conservative in NVIDIA's favor.

The Democratization Argument

Intel Xeon + rolvsparse© vs. NVIDIA B200 — Full Comparison

At ≥80% sparsity a $2,000 dual-Xeon server running rolvsparse© matches or beats a $40,000 B200 running optimised cuBLAS — with no rolv at all. The gap in hardware cost is 20×. The gap in tokens/s disappears. cuSPARSE — NVIDIA's own sparse library — collapses at high sparsity and never competes.

Sparsity Intel Xeon
+ rolvsparse©
NVIDIA B200
cuBLAS · no rolv
NVIDIA B200
cuSPARSE
Hardware Cost Verdict
70%~15,000~80,000~854$2k vs $40kGPU ahead
80%~87,900~80,000~1,199$2k vs $40k$2k CPU overtakes $40k GPU
90%~86,600~80,000~2,389$2k vs $40krolv ahead; cuSPARSE collapses; 20× cheaper
95%~80,000~80,000~5,044$2k vs $40k$2,000 CPU = $40,000 GPU
99%~80,500~80,000~21,487$2k vs $40krolv Intel still ahead

Intel 4k×4k matrices · NVIDIA 20k×20k (25× larger). At equal matrix sizes rolv's advantage would be greater. This comparison is conservative in NVIDIA's favour. Hardware cost: Intel ~$2,000 vs NVIDIA B200 ~$35,000–$40,000.

Mobile & EV — Edge AI

Battery life extension. +31.9% EV driving range.

rolvsparse© runs on-device — Android SoCs, automotive compute modules, embedded safety systems. No cloud dependency. No hardware swap. The same operator that accelerates frontier LLMs on NVIDIA B200 runs on a $200 phone chip and extends EV battery range by restructuring how sparse matrices are computed at the arithmetic level.

+31.9%
EV Driving Range
+44%
Mobile Battery Life
2.3×
EV Vision Safety
54.6%
Mobile Energy Saved
ViT-Base · Android
2.2× faster · 54.6% energy saved · on-device, no cloud.
EV Battery Management
2.1× faster · +33.4% range on identical hardware and battery.
Mobile SoC Avg.
2.42× avg. speedup · 56.6% energy reduction · +44.1% battery life.
04 — Benchmark Data

All real-world results. Downloads & live data.

Real model weights from HuggingFace for all open models. Architecture-matched synthetic fp32 used only for closed models (GPT-4o / Claude 3.5) — the standard methodology. All results available upon request with full methodology, hash verification, and independent validation below.

Independent Validation
University of Miami Frost Institute
Deterministic & reproducible results confirmed by independent academic institution across all tested hardware platforms. No commercial relationship.
↓ Validation Letter
Verification Kit v2.0
Run It Yourself
Run rolv-verifier.py on any hardware — any CPU, any laptop. Generate your own SHA-256 baseline hash and get a full "Us vs. Them" report.
↓ Validation Kit

Frontier-Scale MoE & Dense LLM Benchmarks REAL WEIGHTS · NVIDIA B200

Real model weights from HuggingFace where available. Architecture-matched synthetic fp32 for closed models (GPT-4o / Claude 3.5 Sonnet) — standard methodology.

Model Matrix Speedup Energy Saved Eff. TFLOPS Tokens/sec TTFT Speedup
Llama-4 Maverick 400B655,360×16384 · 128E133.5×99.9%3,210.8149,51452.1×
Llama-4 400B393,216×16384 · 8E125.3×99.4%10,986.7852,680100.9×
Llama-4 Scout40,960×16,384 · 16E34.0×98.8%5,096.43,797,08911.7×
DeepSeek-R1524,288×7168 · 256E44.2×98.7%5,294.1704,36341.6×
DeepSeek-V3524,288×7168 · 256E1.4×98.7%5,072.5674,8734.5×
Mixtral-8×22B (56 layers)131,072×6144 · 8E35.1×98.2%3,650.32,266,37428.2×
Kimi K2.5 (~1T MoE)786,432×896 · 384E10.5×90.6%691.9490,92929.7×
Qwen3-235B (16E)24,576×4096 · 16E7.8×95.5%1,357.06,740,5293.4×
Qwen3-235B (8E)12,288×4096 · 8E4.3×93.7%867.48,616,7762.1×
GPT-4o Class · B=512 ★147,456×7168 · 8E · synthetic68.7×98.5%4,494.22,125,99440.1×
Claude 3.5 Sonnet Class · B=512 ★229,376×8192 · 8E · synthetic83.0×98.8%5,515.31,467,58456.3×

★ GPT-4o and Claude 3.5 Sonnet weights are not public — architecture-matched synthetic fp32, standard methodology. All other rows use real weights from HuggingFace. NVIDIA B200. Hash-verified.

† Effective TFLOPS explained: This column shows effective TFLOPS — computed as the nominal FLOPs of the equivalent dense matmul (2 × M × K × N) divided by ROLV's actual wall-clock time. When ROLV's result exceeds the hardware's theoretical peak (e.g. ~4.5 PFLOPS bfloat16 on B200), it means ROLV is doing far fewer multiply-accumulate operations than the dense baseline to produce the same output — not that the silicon is running faster than physics allows. The metric answers the question: how many dense-equivalent FLOPs per second is ROLV delivering? Values above hardware peak are proof of work reduction, not a measurement error.

Sparse, Pruned & Quantized Models

Real weights. Multiple hardware platforms. † Eff. TFLOPS = nominal dense FLOPs ÷ ROLV wall-clock time — values above hardware peak reflect work reduction, not a measurement error.

Model Hardware Sparsity Speedup Energy Saved Eff. TFLOPS Tokens/sec
Llama-3 70B FFNNVIDIA B20050%50.53×98.0%3,372.77,179,519
Mistral-7B WandaNVIDIA B20055%39.1×97.4%
Llama-2-7B FFNNVIDIA H10070%22.06×95.5%236.98,757,286
Mistral-7B WandaAMD MI300X55%15.8×93.7%
Llama-3.1-8B FFNGoogle TPU v5e-10%8.4×88.2%12,902,131
Qwen2.5-32B FFN real HF weightsGoogle TPU v5e-80%5.9×83.0%3,924,124
Qwen3.5-35B-A3B · 64 experts stacked real HF weights · GPTQ-Int4 · 81,920×512NVIDIA B20061.7%9.4×89.3%10,659.99127,076,958
GLM-OCR · 24 layers stacked real HF weights · gate_proj · 98,304×1,024NVIDIA B2000%50.0×98.0%64,192.62 †318,848,172
BERT-Large FFNAMD MI300X70%4.84×79.3%26.610,578,270
BERT-Large FFNNVIDIA H10070%3.55×71.8%39.515,694,435

CPU Benchmarks INTEL XEON

A $2,000 Intel Xeon system matches or beats a $40,000 NVIDIA B200 at ≥80% sparsity. Real model weights.

Model / Use Case Sparsity Speedup vs Dense Energy Saved Tokens/sec (ROLV)
GPT-J-6B FFN40%314.6×99.7%38,154
Mistral-7B FFN0%253.6×99.6%36,450
Llama-2-7B FFN70%169.2×99.4%43,116
Kimi K2.5 Expert Slice40.3×97.9%84,413
BERT-Base FFN0%12.3×91.8%103,801
Finite Element Solver80%112.48×99.1%

Speedup & Energy — All Workloads

Speedup (×) vs vendor best Energy saved (%) ★ Dense = 0% sparsity benchmark
Workload
Speedup
×
Energy saved
%

Synthetic Benchmarks — By Processor ONE TAB PER PLATFORM

Randomised and structured sparsity patterns on each hardware platform. Architecture-matched dimensions. Select a processor then a sparsity pattern.

Select processor
Select sparsity pattern

05 — Independent Verification

Every result is independently verified.

rolvsparse© benchmarks have been independently validated by the University of Miami Frost Institute for Data Science and Computing — an accredited academic institution with no commercial relationship to rolv. All results are deterministic, reproducible, and hash-verified across every platform.

University of Miami — Frost Institute for Data Science and Computing

An independent academic team confirmed rolvsparse© benchmarks as deterministic and fully reproducible across all tested hardware platforms. Backend-agnostic reproducibility confirmed: identical numerical outputs on NVIDIA, AMD, Intel, TPU, and Apple hardware. Cryptographic SHA-256 output hashes published for independent third-party verification.

"Deterministic and reproducible results confirmed across all tested platforms." — Frost Institute Validation Report

Frost Institute↓ Validation Letter Verification Kit↓ v2.0
No GPU Required

Try It Yourself — Any Hardware. Any Laptop.

Run our verification script on your own hardware and get a cryptographic SHA-256 fingerprint of the result. Email the JSON to rolv@rolv.ai — we run the same computation through rolvsparse© on identical inputs, produce the identical output hash, and return a full "Us vs. Them" comparison report showing your exact speedup and energy savings.

Step 1
Run the Script
Download and run rolv-verifier.py on your hardware. No GPU required — any CPU, any laptop. Novice users: paste into Jupyter and press Shift+Enter.
Step 2
Get Your Hash
The script outputs a SHA-256 fingerprint of your result — a cryptographic baseline unique to your hardware and run. It also captures full hardware specs and energy readings.
Step 3
Get Your Report
Email the .json file to rolv@rolv.ai. We run ROLV against your exact inputs and return a full comparison showing speedup, energy savings, and matching hash.
How SHA-256 Verification Works
The Baseline
Your hardware generates a unique SHA-256 fingerprint of the matrix result — produced entirely on your own machine.
The Match
ROLV processes the same data on our infrastructure and must produce the exact same hash — proving no math was skipped or precision reduced.
The Proof
Identical hash = identical precision. In rare cases of CUDA version drift we confirm numerically (atol=1e-5). The guarantee stands.
v2.0 — Real Hardware Energy Readings
NVIDIA GPUs
pynvml polls the GPU power rail every 50 ms; joules computed via trapezoidal integration of live readings.
AMD GPUs
pyrsmi provides equivalent live readings where the driver supports it. Falls back to estimate if unavailable.
CPU / Apple Silicon
Estimated from psutil CPU utilization × TDP — clearly labelled as an estimate in the output JSON via energy_measurement_method.
Recommended Cloud Environments
RunPod.io
NVIDIA & AMD GPU testing — A100, H100, B200, MI300X. Clean CUDA/ROCm stacks, accurate NVML/AMD SMI telemetry.
Google Cloud
AMD and Intel CPU instances — stable OS images, predictable performance, no power throttling. Ideal for EPYC benchmarks.
Google Colab
Intel Xeon CPU & Google TPU v5e-1 and v6e-1 — free tier available, standardised PyTorch/XLA environments.
Kaggle
Free Google TPU v5e-8 access — ideal for reproducing our TPU benchmarks. No credit card required.
System Requirements
PyTorch 2.5.0+ · CUDA 12.1 recommended · pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu121
Output file format: rolv_baseline_<email>_<timestamp>.json — email to rolv@rolv.ai.
↓ Download Validation Kit v2.0
Academic Validation

University of Miami Frost Institute

The Frost Institute confirmed all rolvsparse© benchmarks as deterministic and reproducible on real hardware across every tested platform. No commercial interest. Engaged solely to verify accuracy and reproducibility.

↓ View Validation Letter →
Reproducibility

SHA-256 Hash-Verified · Cross-Platform

Identical numerical outputs confirmed on NVIDIA, AMD, Intel, TPU, and Apple hardware. The cryptographic hash 8dbe5f139fd946d4cd84e8cc…dad56dd8dd is the same across every platform and sparsity level.

↓ Download Verification Kit →
06 — RSMT & Engineering Tools

The Rolv Sparse Memory Threshold: a universal rule.

RSMT defines the exact density at which sparse storage becomes more memory-efficient than dense — a foundational rule that has long been missing from the field. VRAM, not compute, is the dominant bottleneck in large-scale inference. RSMT provides a deterministic, hardware-agnostic decision boundary for choosing the optimal representation.

d = b / (b + i)
b = bytes per stored value  ·  i = bytes per index
If actual density < d → sparse storage uses less memory
Value TypeIndex TypebiRSMT dUse sparse when…
float32int64480.333density < 33%
float16 / BF16int64280.200density < 20%
float32int32440.500density < 50%
int8int32140.200density < 20%
RSMT Calculator
rolv Unit Calculator

Composite efficiency: (Sparsity × Energy Savings) / 100

07 — Leadership

The Founder.

rolv E. Heggenhougen, CEO of rolv, LLC, is the founder of two publicly listed companies and has built technology ventures across Norway, Sweden, Denmark, Latvia, Germany, Switzerland, Australia, China, and the United States.

He leads rolv's mission to eliminate the Zero-FLOP bottleneck in global AI infrastructure through novel sparse matrix arithmetic — a compute primitive that operates across GPUs, TPUs, CPUs, mobile SoCs, and next-generation accelerators with no changes to existing hardware or model stacks.

Mr. Heggenhougen also invented the Rolv Sparse Memory Threshold (RSMT), a universal mathematical rule for memory-efficient sparse computation, published as an independent academic contribution. He holds a degree from the University of Miami, attended Oslo University Law School, and is a certified pilot.

Fluent in Norwegian, Danish, and Swedish; working knowledge of German.

Patents
2 patents issued, 6 pending (Oct 2025). Covering Binary, Quantum, DNA, Optical, and Plant platforms for AI, plus Mobile and EV applications.
Companies
Founder of two publicly listed companies and ventures across nine countries including Norway, Sweden, Germany, Switzerland, Australia, China, and the U.S.
Education
Graduate of University of Miami. Attended Oslo University Law School. Certified pilot. Fluent in Norwegian, Danish, Swedish.
Validation
All rolv benchmarks independently validated by the University of Miami Frost Institute for Data Science and Computing. Open to third-party audit.
Research
Inventor of the Rolv Sparse Memory Threshold (RSMT) — a universal mathematical rule for memory-efficient sparse computation, published openly.