One primitive — no model changes — GPU · CPU · any platform

AI inference up to 106× faster
and 99% less energy.
Same hardware. Same model. One import line.

Run the benchmark
GPU · CPU · any device · ~2 minutes · results signed to your name
Already ran a test?
Token active
Choose benchmark — runs on our server
Nothing installs · works on any device · signed SHA-256 result
Server: 8 vCPU · 32 GB RAM · shared · a fraction of the H200/B200 hardware used for our published 100×+ results
43×
vs cuSPARSE · 2.49× cuBLAS
OLMoE-1B-7B
87.5% natural sp · REAL
40×
vs cuSPARSE · 2.94× cuBLAS
DeepSeek-V2-Lite
90.6% natural sp · REAL
74×
vs cuSPARSE · 3.38× cuBLAS
Phi-3.5-MoE
87.5% natural sp · REAL
35×
vs cuSPARSE · 3.37× cuBLAS
Qwen1.5-MoE-A2.7B
93.3% natural sp · REAL
Universal Compatibility

Works on every platform. Today and tomorrow.

NVIDIA · AMD · Intel · ARM · Apple · Google TPU · Custom ASICs · FPGAs · Any hardware that does matrix multiply.

Live benchmark

Your device vs ROLV. Side by side. Right now.

The left panel runs standard matrix multiply in your browser — your actual hardware. The right panel runs ROLV on our server with identical inputs. Both signed and explained.

Your hardware — this machine runs the baseline
Benchmark server: 8 vCPU · 32 GB RAM · CPU-only · shared with other visitors · a fraction of the NVIDIA H200/B200 hardware where ROLV delivers 100×+ speedups
Keep this tab open until the signed result appears. If your browser disconnects, our server still saves your result and emails it to you — but you won’t see it on this screen unless the tab stays open.
Your browser — MKL baseline
Standard AI computation — no optimisation
This is what every AI system runs today
Computing...
ROLV server — same inputs
ROLV Primitive© — server-side, protected
Real AI model weights, 500 iterations, signed result
Contacting server... (may take 30s if sleeping)
Free · no install · works on any device · results signed and verifiable
A Story About Waste at Scale

ROLV Makes AI Available to Anyone,
Anywhere with a PC.

Picture a container ship crossing the Pacific. It carries 20,000 containers. The manifest says 5,000 of them are empty — have always been empty, will be empty on arrival. But the ship cannot leave them behind. Its loading system was built decades ago and it can only operate one way: load everything, sail everything, unload everything.

It burns fuel proportional to its total cargo — including the 5,000 empty containers. The crew works proportional to total cargo. The port fees are proportional to total cargo. Every crossing. Every time.

This is what cuBLAS does with MoE inference. The empty containers are the inactive experts — architecturally zero, guaranteed by the router, known before the computation starts. cuBLAS has no mechanism to leave them on the dock. It computes all of them, every token, every layer, every inference call.

ROLV Primitive© is the loading system that reads the manifest first. It identifies the empty containers before departure. It sails only what carries cargo. Same destination. Same output. A fraction of the fuel.

The numbers behind the analogy
DeepSeek-V3 — 256 experts, top-8 active
248
empty containers per token
96.9% of all compute wasted by cuBLAS
ROLV Primitive© computes only
8
active experts — exactly
8.76× faster · 110× vs cuSPARSE · PASS
Mixtral-8×7B — 8 experts, top-2 active
6
empty containers per token
75% of all compute wasted by cuBLAS
ROLV computes only
2
active experts — exactly
1.86× faster · 109× vs cuSPARSE · PASS

Every frontier model crossing the Pacific today carries empty containers. ROLV leaves them on the dock.

Energy at Scale — The Bigger Story

At 95% sparsity, ROLV does 5% of the work.

It also saves 95% of the energy. The two are the same number, by construction.

ROLV doesn’t use a clever data layout to look fast. It computes only the live elements. The compute reduction and the energy reduction are the same physical event — FLOPs that never happen, watt-seconds that never burn, joules that never reach the heat exchanger. At 95% natural sparsity (the regime modern frontier MoE models operate in), ROLV does roughly 1/20th of the work and roughly 1/20th of the energy.

A 1,000-GPU H200 CLUSTER

~$950K/yr

saved on electricity alone (700W per GPU · $0.10/kWh · 24/7 utilisation · 95% workload reduction)

WITH PUE ≈ 1.5

~$1.4M/yr

total facility savings including cooling, power conversion, and infrastructure overhead

CARBON AVOIDED

~3,500 t CO₂/yr

per 1,000 GPUs · equivalent to taking 750 cars off the road, every year, indefinitely

AT HYPERSCALER SCALE

~$140M/yr

facility-level savings on a 100,000-GPU cluster · 350,000 t CO₂/yr avoided · no model retraining · no accuracy loss

Why this matters now

AI inference electricity demand is on track to rival the energy budget of small nations. The U.S. grid is already constrained — new data centres are being delayed by interconnection queues that stretch years. The conventional answer is “build more power plants.” ROLV offers a different answer: do the same inference with 1/20th the electricity. Same models. Same accuracy. Bit-equivalent output verified by SHA-256.

For a frontier-AI hyperscaler, this is the difference between needing one new gas turbine peaker plant and not needing one. For a sovereign AI program, it is the difference between depending on imported energy and not depending. For a CPU-only deployment of an open-weight model, it is the difference between “impossible without GPUs” and “running today on the laptop you already own.”

Energy figures derived from FLOPs reduction proportionality and validated against on-device hardware power readings (pynvml on NVIDIA, time-ratio proxy on CPU). Sparsity assumed at 95% (typical of MoE production deployment). Per-cell energy% disclosure available in the per-case JSON output. Real models, real weights, real watts.

Benchmarks — Real Weights · SHA-256 Verified · 1,000 iters

Full results. Every claim verified.

Latest source · SHA-256 hashed inputs and outputs · ATOL≤0.05 + cosine≥0.999 + perturbation gate every cell · 7 baselines per case. Production speedups vs the modern inference stack actually deployed today — FP8 cuBLAS on Hopper/Blackwell, NVIDIA TensorRT-LLM INT8, INT8 cuBLASLt, structured 2:4 sparse — alongside the legacy sparse vendor reference (cuSPARSE).

■ H200 NVL · Llama-3.1-8B + Mistral-7B-Instruct · 112/112 PASS

Two independent model architectures. Identical matrix shapes for these layer types. The numbers below come from two separate runs on real public HuggingFace weights — and they match within measurement noise (cv% < 1.3% on most cells). That cross-architecture consistency is itself a validation of the underlying ROLV behaviour.

LayerSp%Llama-3.1-8B vs FP8 Mistral-7B vs FP8vs TRT-LLM INT8 vs 2:4 structvs cuSPARSE PASS
down_proj ★ PEAK99%42.93×42.62×~36×~78×~22×
gate_proj99%39.17×38.99×~32×~72×~19×
up_proj99%39.16×39.03×~32×~72×~20×
q_proj99%23.58×23.66×~31×~44×~15×
Peaks at 99% sparsity · full sweep covers 0/50/70/85/90/95/99% · both weight_prune and activation_natural modes · 112/112 cells PASS · vs INT8 cuBLASLt (apples-to-apples vendor INT8) ranges 0.84× (sp=0%) to ~25× (sp=99%) — honest disclosure

NVIDIA H200 NVL (150 GB VRAM, sm_90) · FP32 calibration · ROLV path B (INT8 cuBLASLt) selected by content-aware dispatcher · batch=1024 · 1000 iters per cell · cv% < 1% on most cells

■ H200 NVL · Tier 0 sweep summary · 5 models · 952/952 PASS

ModelCases Mean vs FP8 (0% sp)Mean vs FP8 (95% sp) Peak vs FP8Peak vs TRT-LLM PASS
SmolLM2-1.7B2243.89×7.82×9.46×15.21×224/224
Qwen2.5-1.5B2244.17×8.34×10.92×17.04×224/224
Phi-3.5-mini564.51×8.96×11.83×19.62×56/56
Qwen2.5-7B2244.83×9.41×14.18×23.71×224/224
DeepSeek-R1-Distill-7B2244.32×8.87×12.74×20.96×224/224
Tier 0 H200 total: 952/952 PASS · monotonic per-sparsity curves on every model · 5–14× vs FP8 in production sparsity band (70–90%) · peaks at 99% sparsity

Real HuggingFace weights · sparsity 0/50/70/85/90/95/99% · both weight_prune and activation_natural modes · per-layer testing across q/k/v/o/gate/up/down projections · 1000 iters per cell · 4 SHA-256 hashes (input A, input V, baseline output, ROLV output) + perturbation test every case

■ Intel i7-1165G7 (4-core consumer laptop) · Tier 0 sweep · 3 models · 63/63 PASS

ModelCases vs BF16 production (mean)vs MKL sparse (mean) vs INT8 dynamic (mean)PASS
SmolLM2-1.7B284.27×16.83×1.18×28/28
Qwen2.5-1.5B283.92×14.21×0.94×28/28
Phi-3.5-mini73.61×10.45×0.78×7/7
Tier 0 i7 total: 63/63 PASS · ROLV is roughly tied with INT8 dynamic on consumer CPU at this matrix scale (apples-to-apples honesty) · dominates against BF16 production and MKL sparse · full Xeon (Sapphire Rapids+ with AMX) benchmarks pending separate run

Intel i7-1165G7 (Tiger Lake, 4 cores / 8 threads, 64 GB RAM) · oneDNN baseline path · sparsity 0/50/70/85/90/95/99% · ROLV path B (INT8 dynamic) chosen by dispatcher in most cells · thermal headroom limited on consumer chassis (cv% spikes 30%+ on some smollm cells, disclosed)

ModelSrcNat sp% vs cuBLASvs cuSPARSE Energy%Tokens/sPASS
Mixtral-8×7BREAL75.0%1.86×109×46%2,185,075
Mixtral-8×22Bsynth75.0%2.43×107×59%1,073,568
Qwen2-57B-A14Bsynth87.5%3.37×70×70%2,374,040
Qwen3-30B-A3BREAL93.8%3.43×32×71%6,650,774
Llama-4-Scout ★REAL93.8%4.75×103×79%5,795,875
DeepSeek-V3/R1synth96.9%8.76×110×89%1,758,046

NVIDIA B200 · BF16 · TF32 ON · 1,000 iters · ATOL=0.05 col-norm fp64 · 4 SHA-256 hashes + perturbation PASS

Model / LayerGPUSparsityvs cuBLASvs vendor sparsePASS
LLaMA-3.1-8B up_proj [REAL]H20080%2.17×9.53×
LLaMA-3.1-8B up_proj [REAL]H20090%2.79×8.66×
DeepSeek-R1 embed [REAL]B20095%19.42×19.42×
10k×10k syntheticB20070%3.11×12.06×
Tesla T4 syntheticT490%5.8×14.2×

■ AMD MI300X · 10-model portfolio · 486/486 PASS

Real layer shapes from 10 production-grade frontier models — including the largest open-weight LLMs in deployment. Same hybrid harness as our NVIDIA portfolio, with rocBLAS/rocSPARSE auto-selected. Headline: peak 74.02× vs rocSPARSE (NVIDIA-equivalent: cuSPARSE) — the methodologically-correct sparse-vs-sparse comparison. Peak 13.53× vs rocBLAS (dense). 0% sparsity at near-parity (median 0.90×): no downside risk for drop-in adoption.

ModelCells Peak vs rocBLAS Peak vs rocSPARSE PASS
LLaMA-3.1-405B shapes ★ PEAK3613.53×74.02×36/36
LLaMA-3.1 8B + 70B shapes7211.88×66.50×72/72
DeepSeek-V3 671B/37B5411.66×64.95×54/54
Qwen3-235B-A22B4211.45×69.18×42/42
Qwen2.5-72B3610.46×69.09×36/36
Mistral Large 35410.22×65.44×54/54
Llama-4 Scout + Maverick5410.21×67.20×54/54
Kimi K2 1T/32B active549.74×58.25×54/54
Microsoft Phi-4 14B429.59×61.22×42/42
OpenAI GPT-OSS 120B/20B429.04×57.33×42/42
Aggregate: 486 measurements · mean 2.16× vs rocBLAS · mean 32.64× vs rocSPARSE · median 30.93× vs rocSPARSE · 486/486 PASS
Plus: 10k×10k synthetic sweep on MI300X · 22 sparsity levels (0–99.9%) · 22/22 PASS · peak 8.5× rocBLAS / 83.77× rocSPARSE

AMD Instinct MI300X (192 GB HBM, ROCm 6.2) · FP32 dense baseline · CSR (rocSPARSE) sparse baseline above 70% sparsity · batch=512 · 200 iters per cell · ATOL=0.05 column-normalised · SHA-256 hashes (A, V, baseline, ROLV) + perturbation test on every cell. Sparsity sweep 0/50/70/80/90/95% per layer. 0% near-parity disclosure: peak 1.25×, median 0.90×, min 0.46× on AMD — ROLV at zero sparsity costs at most ~10% on average, gains rapidly as sparsity climbs.

2,170/2,170 total PASS across all GPU benchmarks (NVIDIA: 1,684 + AMD: 486) · BF16 · TF32 ON · ATOL=0.05 · Full per-cell CSV available on request

Model / LayerCPUSparsity vs MKL (iter)vs MKL (total+build) Energy↓PASS
Mistral-7B q_proj [REAL]Intel i795%21.45×18.58×95%
Qwen3-8B down_proj [REAL] ★Intel i795%20.86×17.88×95%
Gemma4-E4B up_proj [REAL] ★Intel i795%19.56×17.29×95%
Llama-3.1-8B q_proj [REAL] ★Intel i795%24.44×22.20×96%
Qwen2.5-7B gate_proj [REAL] ★Intel i795%59.70×98%
SmolLM2-1.7B · Qwen2.5-1.5B · Llama-3.2-1B on Colab Xeon · 125/125 PASS at 70–99% induced sparsity
SmolLM2-1.7B gate_proj [REAL]Xeon Colab95%27.26×96%
Llama-3.2-1B down_proj [REAL] ★ PEAKIntel i799%106.65×9.07×99%
Comprehensive Sweep · 4 models × 7 layers × 7 sparsities · 196/196 PASS · cosine≥0.999 + ATOL≤0.05 + perturbation
mistral-7B (all 7 layers)Intel i70–99%peak 18.31×avg 6.67×98%49/49
llama-7B (all 7 layers)Intel i70–99%peak 25.27×avg 7.02×98.8%49/49
qwen-7B (all 7 layers)Intel i70–99%peak 24.71×avg 7.25×98.6%49/49
mixtral (all 7 layers)Intel i70–99%peak 20.50×avg 7.77×98.6%49/49
Dispatcher chose: crcs_int8_outlier_extract 84/196 (low sp) · crcs_fp32 68/196 (mid-high sp) · extreme_skip 42/196 (extreme sp) · vendor_dense 2/196 (where vendor wins, ROLV binds to vendor — never slower)
TOTAL CPU: 9 individual models + 196-cell sweep = 528/528 PASS · Avg 7.18× (sweep mean) · Peak 106.65×

Intel i7 laptop (4 cores, 68GB RAM) · comprehensive sweep tested mistral-7B + llama-7B + qwen-7B + mixtral on real HuggingFace weights at 0/50/70/85/90/95/99% sparsity across q/k/v/o/gate/up/down projections · MKL baseline · batch=512 · 500 iters · ATOL≤0.05 + cosine≥0.999 + perturbation gate every cell · Strategy chosen by content-aware dispatcher · 196/196 sweep + 252/252 i7 (prior) + 125/125 Colab = 573/573 PASS

HardwareMatrixSparsity cuSPARSE / rocSPARSE msROLV ms ROLV winsPASS
NVIDIA H200LLaMA up_proj80%5.900.6199.53×
NVIDIA H200LLaMA up_proj90%3.010.3488.66×
NVIDIA B200Mixtral-8×7B MoE75%25.650.234109×
NVIDIA B200Llama-4-Scout MoE94%9.140.088103×
NVIDIA B20010k×10k synthetic70%4.310.3612.06×
AMD MI300X · vs rocSPARSE (rocSPARSE = AMD’s cuSPARSE equivalent) · 10-model production portfolio
AMD MI300XLLaMA-405B shapes ★95%74.02×
AMD MI300XQwen3-235B-A22B95%69.18×
AMD MI300XQwen2.5-72B95%69.09×
AMD MI300XLlama-4 Scout95%67.20×
AMD MI300XLLaMA-3.1 8B/70B shapes95%66.50×
AMD MI300XMistral Large 395%65.44×
AMD MI300XDeepSeek-V395%64.95×
AMD MI300XMicrosoft Phi-4 14B95%61.22×
AMD MI300XKimi K295%58.25×
AMD MI300XOpenAI GPT-OSS95%57.33×
AMD MI300X10k×10k synthetic85%74.270.8983.77×
AMD aggregate: 486 measurements · mean 32.64× · median 30.93× vs rocSPARSE · 486/486 PASS
Intel i7 CPUMistral-7B q_proj95%66.43.1814.01×

cuSPARSE / rocSPARSE are NVIDIA’s and AMD’s own sparse libraries — the reference implementation for sparse linear algebra on each platform. ROLV beats both consistently across hardware and architectures because dense matmul on a compressed submatrix outperforms CSR index lookups on the structured sparsity patterns common in modern AI weights. AMD MI300X numbers come from real production-model layer shapes (LLaMA-405B, DeepSeek-V3, Qwen3-235B, etc.), 486/486 PASS with SHA-256 hash and perturbation verification on every cell.

Watertight Benchmark Format

Every claim, every cell, every baseline. Disclosed.

Each ROLV benchmark cell prints a 12-step audit-grade record. Inputs are SHA-256 hashed before computation. Outputs are SHA-256 hashed after. Multiple independent baselines are timed in the same session on the same hardware against the same inputs. Variance is disclosed (cv%). All accuracy gates are checked and reported. Nothing is summarised away. There is no cherry-picked baseline.

[1] IDENTIFICATION

Model name, weight source (REAL HuggingFace vs synthetic), exact layer name (e.g. model.layers.0.mlp.gate_proj), matrix dimensions M×K, batch size, target hardware. Reviewer can reproduce the exact cell.

[2] SPARSITY DETECTION

Natural sparsity %, active rows / total rows, active cols / total cols, FLOPs reduction %, RSMT™ threshold, dispatcher’s vendor selection, ROLVswitch™ path chosen. The reduction in compute is measured, not estimated.

[3] INPUT HASHES

SHA-256 of weight matrix A and SHA-256 of input vector V before any computation begins. Reviewer can verify these hashes match the public HuggingFace weights they pulled themselves. Forecloses any “you used different inputs” objection.

[4] OUTPUT HASHES

SHA-256 of vendor reference output Y_baseline and SHA-256 of ROLV output Y_ROLV. When path A wins, hashes are bit-identical. When path B wins, hashes differ but ATOL gate confirms numerical equivalence within tolerance.

[5] MULTI-BASELINE TIMING

Side-by-side ms/iter, p50, p99, stdev (cv%), GFLOPs/s, tokens/s for cuBLAS-FP32, cuBLAS-FP8, cuSPARSE, and ROLV in the same run. No baseline is hidden. cv% is published — if a measurement was noisy, it shows.

[6] CORRECTNESS GATES

max_abs_err, mean_abs_err, max_rel_err%, mean_rel_err%, ATOL gate (≤0.05 on column-normalised fp64), cosine gate (≥0.999), and perturbation gate. PASS / FAIL printed per gate. Cell is FAIL if any gate fails.

[7] APPLES-TO-APPLES SPEEDUP

ROLV vs the same-precision vendor baseline (e.g. ROLV path B INT8 vs INT8 cuBLASLt). When the apples-to-apples baseline is faster than ROLV, that is published unflinchingly — e.g. INT8 cuBLASLt 0.84× at sp=0% on H200. No flattering of comparisons.

[8] PRODUCTION DEPLOYMENT BASELINES

ROLV vs the modern stacks actually used in production: TensorRT-LLM INT8, FP8 cuBLAS on Hopper/Blackwell, structured 2:4 sparse, INT8 cuBLASLt, FP16, BF16. Six baselines, all timed in the same session, all reported.

[9] FP32 PRECISION REFERENCE

Speedup vs cuBLAS-FP32 separately reported, both as iter speedup and total speedup including build cost amortised over a single inference. Energy saved %, FLOPs saved %.

[10] cuSPARSE SPARSE VENDOR REFERENCE

Speedup vs cuSPARSE separately reported. cuSPARSE is the NVIDIA-tuned sparse vendor library — the fairest direct sparse-vs-sparse comparison. Energy saved % vs cuSPARSE published.

[11] MEMORY vs COMPUTE EFFICIENCY

Weight bytes (dense vs ROLV), memory reduction %, bytes/FLOP ratio, compute density ratio. Addresses the “memory-bound LLM” reviewer critique — ROLV reduces both bytes moved and FLOPs performed in the same proportion. Speedup is from doing strictly less work, not from clever data layout.

[12] FUSED-KERNEL DISCLOSURE

Lists fused-kernel paths available on the box (FlashAttention, Triton, FP8 hardware, etc.). Reviewer knows which vendor optimisations were enabled in the comparison. No silent disabling of vendor fast paths.

No breadcrumbs, no asterisks, no “up to” framing. Every cell is a complete record. Every claim on this site traces back to a JSON file with the 12 fields above. Reviewer reproduces the exact case on their hardware, computes the four hashes themselves, and compares to the published values. If the hashes match and the gates pass, the speedup number is what it is. The benchmark format leaves no room for selective reporting.

Calculators

Measure. Switch. Save.

Quantify ROLV's impact on your infrastructure. The two primary calculators below cover capital and operating expense; below them, three advanced tools for deeper analysis.

▲ Capex Savings Calculator
Current capex
$3.0B
Units saved
80,000
Capex saved
$2.4B
Speedup from published ROLV benchmarks on real model weights.
▲ Opex Savings — Energy Calculator
Total cost/yr
$76.5M
Saved/yr
$35M
3-year saving
$105M
CO² avoided/yr
117,000 t
Energy savings based on ROLV benchmark results at stated sparsity level.
Advanced tools
△ ROLV Unit™ — Measure True Compute Efficiency

The ROLV Unit™ is a normalised measure of compute efficiency that accounts for sparsity. Unlike TFLOPS (which measures peak theoretical throughput) or tokens/s (which conflates hardware and software), the ROLV Unit measures useful compute — work done on non-zero elements only.

1 ROLV Unit = 1 TFLOP of compute on live (non-zero) matrix elements per second, at full precision, verified by SHA-256 hash.

Your Compute in ROLV Units
Without ROLV
562 RU
wasted on zero rows
With ROLV
2,250 RU
all compute is useful
Cluster efficiency gain
4.0× more useful compute — same hardware
ROLV Unit = TFLOPS on verified non-zero elements. Vendor TFLOPS counts all compute including zero rows.
▶ ROLVswitch™ & VRAM — Crossover & Memory Calculator

ROLVswitch™ finds the exact sparsity where ROLV beats dense, and whether your matrix fits in VRAM.

ROLVswitch Analysis
Switch to ROLV above
VRAM analysis
At your sparsity
■ RSMT™ — Sparse Storage Threshold Calculator

RSMT™ finds the exact sparsity threshold where sparse storage beats dense for your dtype.

Loading...
Why RSMT™ Matters

The crossover point depends entirely on your dtype. With bfloat16 (2 bytes) and int32 indices (4 bytes), sparse format costs 3× more bytes per non-zero than dense. Sparse wins only when you have enough zeros to overcome the index overhead.

Your MoE models at bfloat16
Mixtral-8×7B: 75%  ✓ well above crossover
Qwen3-30B-A3B: 93.8%  ✓ far above crossover
Llama-4-Scout: 93.8%  ✓ far above crossover
DeepSeek-V3: 96.9%  ✓ extreme advantage

RSMT™ is computed analytically — no approximation.

Enterprise & Institutional Evaluation

Evaluate on your own hardware.
NDA-gated. Hardware-locked. Signed every run.

Two deployment tiers for serious evaluation on your own models, your own data, your own processors. If you just want to see ROLV working end-to-end first, the live benchmark above runs in under two minutes with no install. All enterprise runs are RolvKey™-signed — SHA-256 over your speedup, processor fingerprint, and a time-bounded attestation.

Recommended
Secure Container

RolvKey™ authenticated.
Hardware-locked Docker.

Evaluation licence + NDA. Container binds to your processor fingerprint at first run — will not execute on any other machine. Optional Intel SGX hardware encryption for regulated environments.

Contact rolv@rolv.ai →
Direct Hardware

No Docker.
Single authenticated file.

Bare-metal servers and air-gapped environments where Docker is not permitted. Processor-bound binary with live heartbeat attestation. Evaluation licence + NDA required.

Contact rolv@rolv.ai →
RolvKey™ — New IP — Patent Pending

A second invention, born from protecting the first.

In building the secure distribution system for ROLV Primitive© we developed a novel software protection architecture that we believe has standalone commercial value entirely apart from ROLV itself.

RolvKey™ uses a proprietary multi-layer mathematical key derivation system. Every key exchange is unique and time-bounded to a window of seconds. A captured response is worthless moments later. An attacker who somehow breaks the first layer immediately faces a second independent layer, then a third — each seeded with a completely different secret.

The only viable attack requires simultaneously compromising multiple independent systems within a narrow time window. For any commercial adversary this is not a realistic threat model.

Market opportunity

Every software company shipping proprietary compiled code faces the same distribution security problem. Current solutions — hardware dongles, standard license servers, code obfuscation — have well-documented weaknesses. The academic literature identified this specific application — software distribution key management and API attestation — as commercially unsolved. RolvKey™ addresses it.

Live right now

RolvKey™ is protecting ROLV Primitive© today. Every Docker container download, every key exchange, every benchmark run on every machine worldwide is secured by this system. It has been exercised thousands of times in production.

Licensing and partnership enquiries: rolv@rolv.ai

Independent Verification

Every result is independently verifiable.

4 SHA-256 hashes per case. Perturbation test on every result. ATOL=0.05 + cosine≥0.999 on column-normalised fp64. 1,684/1,684 GPU PASS · 573/573 CPU PASS (incl. comprehensive 4-model × 7-layer × 7-sparsity sweep, 196/196). Download the full validation kit with harness code, raw outputs, and reproduction instructions.

↓ Full Benchmark PDF
About the Founder

One bike ride. Six months. A primitive that beats NVIDIA's own libraries.

R

Rolv Eitrem Heggenhougen — Norwegian-born entrepreneur, mathematics graduate, serial founder with companies built across Europe and the United States. In May 2025, on a bike ride in Fort Lauderdale, he saw that AI matrix operations were doing enormous amounts of unnecessary work. He could see it mathematically. He refused to stop until he had proven it.

"Imagination is the only limitation to innovation."

Contact

Contact Us

rolv@rolv.ai
Patent Pending ·
ROLV LLC · Fort Lauderdale, FL