rolvsparse© accelerates matrix operations on highly sparse weight matrices. It targets the 90%+ sparsity regime — the natural operating point of mixture-of-experts models and aggressively pruned transformers. Software-only. Runs on existing hardware.
rolvsparse© is a software operator that restructures matrix arithmetic to skip zero-valued multiply-accumulate operations. At high sparsity levels — where 90% or more of a weight matrix is zero — this approach delivers substantial reductions in compute time and energy consumption.
Works best when matrices are genuinely sparse. At 90%+ sparsity, the operator skips the vast majority of multiply-accumulate operations — the work simply does not happen.
No hardware modifications. No new chips. No changes to model weights or architecture. Runs on existing CPU and GPU infrastructure.
Fewer operations means less energy. At 90%+ sparsity, energy savings scale proportionally with the work eliminated — a direct consequence of doing less arithmetic.
The operator is built once from a weight matrix and then used repeatedly for inference. Build time is amortised across thousands of inference calls.
On correctness: rolvsparse© is an approximate operator. Pruning removes weight information, which introduces output error proportional to the sparsity level. This is expected and standard for compressed inference — the goal is to operate within a defined tolerance budget (typically normalized output error under 0.10) while maximising speed and energy savings. All published results include correctness metrics alongside speedup figures.
All benchmarks use real open-source model weights downloaded directly from HuggingFace. Compared against dense matrix multiply as baseline. Results include normalized output error to verify correctness. Full JSON with hashes available on request.
4 MLP gate projection layers (14336×4096 each). 99% sparsity target. Batch=2048, 1000 iterations. Normalized output error 0.007–0.008 across all layers — well within tolerance.
| Layer | Sparsity | Speedup | Energy Saved | Norm Error | Result |
|---|---|---|---|---|---|
| layer0.gate_proj | 99.1% | 8.09× | 87.6% | 0.0073 | PASS |
| layer1.gate_proj | 99.2% | 7.73× | 87.1% | 0.0075 | PASS |
| layer2.gate_proj | 99.0% | 6.43× | 84.5% | 0.0077 | PASS |
| layer3.gate_proj | 99.0% | 2.89× | 65.3% | 0.0078 | PASS |
NVIDIA B200 · Mistral-7B-Instruct-v0.3 · Batch=2048 · 1000 iters · torch CSR · vs dense cuBLAS baseline · hash-verified
Same model, same layer, run on a standard desktop CPU. Demonstrates the operator works across hardware. Batch=512, 1000 iterations. Normalized output error 0.0073.
Intel i7 desktop CPU · Mistral-7B-Instruct-v0.3 · layer0.gate_proj (14336×4096) · Batch=512 · 1000 iters · scipy CSR · vs dense MKL baseline · hash-verified
Methodology: All results compare ROLV against dense matrix multiply on the same hardware. Correctness is measured using normalized column error (ATOL=0.10). Each run outputs four SHA-256 hashes — input matrix, input vector, dense baseline output, and ROLV output — to verify real computation on real weights. Full benchmark JSON with all hashes available on request.
rolvsparse© is not a general-purpose dense operator. It is a specialist tool for workloads where sparsity is high and the operator is applied repeatedly — conditions common in production AI inference.
MoE architectures activate a small fraction of experts per token — often fewer than 5%. The inactive expert weight matrices are naturally 95%+ sparse at inference time. rolvsparse© is designed for exactly this structure.
Post-training pruning at 90%+ sparsity creates weight matrices with the right structure for rolvsparse©. The operator complements magnitude pruning, structured pruning, and similar compression workflows.
CSR format begins to compete with dense MKL/cuBLAS around 75–80% sparsity. Below this threshold, dense operators typically win. rolvsparse© is honest about this boundary.
Below 70% sparsity, dense cuBLAS and MKL consistently outperform sparse operators on modern hardware. rolvsparse© does not claim otherwise and does not benchmark in this regime.
rolvsparse© is covered by three US patent applications currently pending. The filings cover the core operator methodology and its application to AI inference workloads.
Core operator methodology for sparse matrix acceleration in AI inference pipelines.
Energy reduction techniques through elimination of zero-valued arithmetic operations.
Adaptive operator selection and parameter tuning for varying sparsity conditions.
All applications filed in the United States. Patent-pending status. Details available to qualified parties under NDA.
For technical enquiries, access to benchmark data, or discussions about the technology, please reach out directly.
Rolv E. Heggenhougen
rolv LLC · rolv.ai
@rolveitrem