MatMul Proof-of-Work
Specification of the BTX MatMul proof-of-work consensus algorithm: finite-field matrix multiplication, transcript compression, Freivalds verification, and DoS mitigation.
This reference is part of the BTX operator rollout.
Use the Mine microsite to reconnect this page to the broader operator thesis, AI-infrastructure framing, and mining launch kit.
Overview
BTX replaces traditional hash-based proof-of-work with MatMul PoW, an AI-native proof-of-work system based on matrix multiplication over a finite field. The protocol, derived from the cuPOW construction (Komargodski, Schen, Weinstein 2025), achieves a multiplicative overhead of only ~16.5% above a bare matrix multiply at the production parameters. Miners perform real linear-algebra computation rather than brute-force hashing, making the work inherently useful as a GPU compute benchmark and a stepping stone toward externally useful matrix workloads in a future v2.
Consensus Parameters
| Parameter | Symbol | Value | Role |
|---|---|---|---|
| Matrix dimension | n | 512 | Compute cost O(n³); ~134M field multiplications per attempt |
| Transcript block size | b | 16 | Hashing granularity; (n/b)³ = 32,768 intermediates per attempt |
| Noise rank | r | 8 | Security parameter; denoising cost O(n²r) |
| Field modulus | q | 2³¹ − 1 (2,147,483,647) | Mersenne prime M31; GPU-friendly int32 arithmetic |
| Pre-hash epsilon bits | ε | 10 (18 after height 55,000) | Sigma gate: target « ε before expensive MatMul path |
| Freivalds rounds | k | 2 | False-positive probability < 2−62 |
| Validation window | — | 1,000 blocks | Depth beyond which full Phase 2 revalidation is not required |
| Block time | — | 90 s (steady-state); 250 ms (fast-mine, h < 50,000) | Target spacing for difficulty adjustment |
Note: b (transcript block size) and r (noise rank)
are independent parameters. Changing b affects transcript size and cache behavior;
changing r affects security conjecture strength and denoising overhead.
Finite-Field Arithmetic (FM31)
All arithmetic operates in the Mersenne prime field Fq where q = 231 − 1. This prime was chosen for GPU friendliness: reduction is a single add-and-mask on int32 hardware, and the field supports native Apple Metal and CUDA int32 pipelines.
Reduction
The reduce64 function uses a double Mersenne fold, proven safe for all uint64
inputs (the single-fold variant is buggy for x ≥ 262). The dot()
accumulator is the only sanctioned path, applying per-step reduction with a
length-independent safety proof.
Matrix Generation from Seeds
Matrices A and B are deterministically expanded from 32-byte seeds carried in the block header. The expansion uses SHA-256 as a PRF with LE32 counters and rejection sampling for uniform distribution mod M31. Domain separation tags prevent cross-component collisions:
matmul_noise_EL_v1— left noise factor ELmatmul_noise_ER_v1— right noise factor ERmatmul_noise_FL_v1— left noise factor FLmatmul_noise_FR_v1— right noise factor FRmatmul_compress_v1— compression vector σ
Mining Algorithm (Solve)
The mining loop for a single nonce attempt proceeds as follows:
- Sigma gate (pre-hash): Compute a cheap SHA-256 pre-hash from the candidate
header. If the pre-hash does not satisfy
target « ε, skip the expensive MatMul entirely. This filters out ~99.9% of nonces at ε=10. - Noise generation: Derive four low-rank factor matrices from the oracle seed σ: EL ∈ Fqn×r, ER ∈ Fqr×n, FL ∈ Fqn×r, FR ∈ Fqr×n. Form E = EL·ER and F = FL·FR (each rank ≤ r).
- Noisy MatMul: Compute C' = (A + E)·(B + F) using the canonical block decomposition with block size b=16. The computation iterates over (n/b)³ = 32,768 intermediate b×b partial-sum blocks in a fixed order (i, j, l).
- Transcript compression: Each intermediate block is compressed to a single field element via a random inner-product derived from σ. The compressed elements are fed into a rolling SHA-256d hasher, reducing hashed data from ~33.5 MB (naive) to ~131 KB.
- Difficulty check: If H(compressed transcript) < target, the nonce is valid. Denoise: C = C' − A·F − E·(B + F). Emit the block with matmul_digest, seed_a, seed_b, and the product matrix C'.
Transcript Compression
Naive full-block hashing would require SHA-256 over ~33.5 MB per nonce attempt, adding 30–100% overhead on GPU miners. BTX uses field-algebraic inner-product compression to reduce this by ~256×:
| Approach | Bytes Hashed | SHA-256 Time (SHA-NI) | Overhead vs MatMul |
|---|---|---|---|
| Naive full-block | 33.5 MB | ~67 ms | 30–100% |
| Field-algebraic compression | ~131 KB | ~0.26 ms | < 1% |
Security is preserved: the random inner-product family is pairwise independent, so forging a different b×b block that produces the same compressed element requires guessing a field element (probability 1/q ≈ 2−31 per intermediate). A single σ-derived compression vector is reused across all 32,768 intermediates; Carter-Wegman per-intermediate binding plus σ-unpredictability makes per-intermediate vectors unnecessary.
On GPU architectures, the compression dot-product is fused into the matmul kernel to avoid the ~33.5 MiB data transfer penalty. Compressed elements (~128 KiB total) are written to a ring buffer and consumed by a CPU SHA-256 thread.
Verification (Freivalds' Algorithm)
Full verification requires recomputing the entire O(n³) transcript — the dominant engineering constraint. BTX uses a two-phase validation model:
Phase 1: Header Checks (O(1))
- Validate header format, timestamp, and nBits target.
- Verify matmul_digest matches the claimed SHA-256d of the compressed transcript.
- Check proof-of-work: H(header) < target.
Phase 2: Transcript Recomputation (O(n³))
- Reconstruct A, B from seeds. Regenerate noise. Recompute the canonical block MatMul.
- Verify the compressed transcript matches matmul_digest.
- Apply Freivalds' algorithm as a fast product check: verify (A+E)·((B+F)·r) = C'·r for k=2 random vectors r. Each round has false-positive probability 1/(231−1); with k=2, error < 2−62.
Graduated Peer Punishment
Phase 1 pass followed by Phase 2 fail triggers graduated response: disconnect →
discourage → ban (after 3rd offense in 24h). The fMatMulStrictPunishment
flag enables immediate bans for post-stabilization networks.
DoS Mitigation
Phase 2 verification at n=512 costs approximately 5 ms per block. Rate limits bound CPU exposure:
- Per-peer budget: 32 Phase 2 verifications per minute per peer.
- Global budget: 512 Phase 2 verifications per minute across all peers (~2.56 s CPU/min).
- Validation window: Blocks deeper than 1,000 from the tip skip Phase 2 (assumevalid model for IBD).
- Fast-phase scheduling: During the 250 ms fast-mine phase, Phase 2 is deferred to a bounded queue that must drain completely after the transition to 90 s blocks.
Block Header Fields
The MatMul block header extends the standard Bitcoin header with:
| Field | Size | Description |
|---|---|---|
nNonce64 | 8 bytes | 64-bit mining nonce |
matmul_digest | 32 bytes | SHA-256d of compressed transcript |
matmul_dim | 2 bytes | Matrix dimension (512) |
seed_a | 32 bytes | Deterministic seed for matrix A |
seed_b | 32 bytes | Deterministic seed for matrix B |
Block size with seed-derived matrices is approximately 200 KiB, compared to ~2.2 MiB if full matrices were included. All serialization uses little-endian byte order; big-endian architectures are unsupported.
Overhead Analysis
At production parameters (n=512, b=16, r=8), total protocol overhead is ~16.5% above a bare n×n matrix multiply:
| Component | Cost | % of Baseline |
|---|---|---|
| Noisy MatMul (A+E)·(B+F) | 134M muls | baseline |
| Transcript compression dot-products | 8.4M muls | 6.3% |
| Denoising (rank-r factor products) | 8.4M muls | 6.3% |
| Noise generation + matrix additions | 4.7M ops | 3.5% |
| Rolling SHA-256d on compressed elements | 131 KB hashed | < 0.1% |
| Total overhead | ~22M muls | ~16.5% |
On GPUs where the compression dot-product is fused into the matmul kernel, measured total overhead drops to ~10–12%.
Difficulty Adjustment Interaction
MatMul PoW difficulty is governed by the ASERT algorithm (see the Difficulty Adjustment specification). The sigma pre-hash gate interacts with the difficulty target: at higher difficulty, the pre-hash filter rejects more nonces before the expensive MatMul path, keeping average GPU utilization efficient. The pre-hash epsilon upgrade at height 55,000 (from 10 to 18 bits) further reduces wasted compute as the chain matures.
GPU Optimization
The field choice (M31 = 231 − 1) was selected specifically for GPU friendliness. Key optimization points:
- M31 reduction is a single add-and-mask on int32 ALUs, native on Apple Metal and CUDA.
- The compression dot-product must be fused into the matmul kernel to avoid a ~33.5 MiB device-to-host transfer per attempt.
- Compressed elements (~128 KiB) are written to a ring buffer consumed asynchronously by a CPU SHA-256 thread.
- The 512×512 working set (~2 MiB) fits comfortably in GPU L2 cache on modern hardware.
- Denoising runs only once per found block (not per attempt), so its O(n²r) cost is negligible in practice.
Security Properties
- Completeness: An honest miner performing the full MatMul is accepted with probability ε per attempt, determined by the difficulty target.
- Hardness: Under the direct-product conjecture for random rank-r matrix products, no adversary gains meaningful advantage with less than a full matrix multiplication.
- Collision resistance: Per-intermediate collision probability is 2−31; per-block union bound gives ≤ 2−16. A dual-projection upgrade path exists for 2−62 per intermediate if future risk assessment warrants.
- ASIC resistance: ASICs are not cryptographically impossible but are economically misaligned with commodity AI/GPU hardware. The MatMul workload is identical to standard GEMM operations that AI accelerators are optimized for.
Monetary Policy
| Parameter | Value |
|---|---|
| Total supply | 21,000,000 BTX |
| Initial block subsidy | 20 BTX |
| Halving interval | 525,000 blocks |
| Fast-mine emission (blocks 0–49,999) | 1,000,000 BTX (4.76% of supply) |
The issuance follows an infinite geometric series: 20 × 525,000 × 2 = 21M.
MoneyRange() and GetBlockSubsidy() enforce the hard cap at consensus level.