Skip to content

Zero-Knowledge Proof of Contribution (ZK-PoC)

Technical Design for Privacy-Preserving Federated Learning on Holochain

Date: October 1, 2025 Status: Research & Design Phase Goal: Enable Holochain validators to verify FL gradient quality WITHOUT seeing private data


🎯 The Problem: Privacy-Validation Paradox

Holochain's Validation Requirement

Holochain's DHT requires public validation - all peers validate all entries:

#[hdk_extern]
pub fn validate(op: Op) -> ExternResult<ValidateCallbackResult> {
    // PROBLEM: Validator needs to check gradient quality
    // BUT: Can't see private training data (HIPAA, GDPR, etc.)

    match op {
        Op::StoreEntry(StoreEntry { entry, .. }) => {
            // How do we validate WITHOUT seeing private data?
            validate_gradient_quality(gradient)?  // ❌ Exposes data
        }
    }
}

Federated Learning's Privacy Requirement

FL participants have sensitive data they cannot share: - Medical: Patient records (HIPAA violation) - Financial: Transaction history (PCI-DSS violation) - Personal: User behavior (GDPR violation)

The Paradox

Holochain Needs: Public validation of gradient quality
FL Needs: Private training data that can't be revealed

❌ Traditional Solution: Pick one (lose privacy OR lose validation)
βœ… ZK-PoC Solution: Prove quality WITHOUT revealing data

🧬 Solution Architecture: Zero-Knowledge Proofs

What is a Zero-Knowledge Proof?

A cryptographic proof that allows one party (prover) to convince another party (verifier) that a statement is true, without revealing any information beyond the truth of the statement.

Example: - Statement: "I computed a good gradient from valid data" - Traditional proof: Share data + gradient β†’ Verifier checks β†’ βœ… or ❌ - ZK proof: Generate cryptographic proof β†’ Verifier checks proof β†’ βœ… or ❌ - Result: Verifier learns NOTHING about data, only that statement is true

ZK-PoC for Federated Learning

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FL Node (Prover)                         β”‚
β”‚  β€’ Has private dataset D                                    β”‚
β”‚  β€’ Computes gradient G from D                               β”‚
β”‚  β€’ Generates ZK proof Ο€ proving:                            β”‚
β”‚    1. G was computed from valid dataset D                   β”‚
β”‚    2. D meets statistical requirements (size, distribution) β”‚
β”‚    3. G improves model on public validation set             β”‚
β”‚  β€’ Publishes (G, Ο€) to Holochain DHT                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                Holochain Validators (Verifiers)             β”‚
β”‚  β€’ Receive (G, Ο€) from DHT                                  β”‚
β”‚  β€’ Verify proof Ο€ WITHOUT seeing D                          β”‚
β”‚  β€’ Accept/reject gradient based on proof                    β”‚
β”‚  β€’ NO access to private data D                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”¬ Technical Implementation Options

Option 1: zk-SNARKs (Zero-Knowledge Succinct Non-Interactive Arguments of Knowledge)

Best For: Complex computations, minimal proof size

Libraries: - Rust: ark-works (most mature) - Rust: bellman (used by Zcash) - JavaScript: snarkjs (for web UIs)

Pros: - βœ… Tiny proofs (few hundred bytes) - βœ… Fast verification (<10ms on CPU) - βœ… Handles complex logic (PyTorch forward passes)

Cons: - ❌ Trusted setup required (ceremony with 100+ participants) - ❌ Slow proof generation (10-60 seconds for gradient proof) - ❌ High memory usage (2-8 GB RAM for prover)

Use Case: Medical/Finance where trust is critical, can tolerate slow proof generation


Option 2: zk-STARKs (Zero-Knowledge Scalable Transparent Arguments of Knowledge)

Best For: Transparency (no trusted setup), post-quantum security

Libraries: - Rust: winterfell (Facebook's library) - Rust: plonky2 (fastest STARK prover)

Pros: - βœ… NO trusted setup (transparent) - βœ… Post-quantum secure (future-proof) - βœ… Faster proof generation than SNARKs (5-30 seconds)

Cons: - ❌ Larger proofs (10-50 KB vs 200 bytes for SNARKs) - ❌ Slower verification (50-100ms vs 10ms for SNARKs) - ❌ Less mature ecosystem

Use Case: Public blockchain audit trails, long-term archival


Option 3: Bulletproofs (Range Proofs)

Best For: Simple range/bound checks, no trusted setup

Libraries: - Rust: bulletproofs

Pros: - βœ… NO trusted setup - βœ… Fast for range proofs (e.g., "gradient norm < threshold") - βœ… Small proofs for simple statements

Cons: - ❌ Limited to range proofs (not general computation) - ❌ Can't prove complex PyTorch logic - ❌ Linear verification time (slower for complex statements)

Use Case: Lightweight gradient norm checks, Byzantine detection


Phase 1: Bulletproofs for Simple Checks (MVP)

What to Prove: 1. Gradient norm is within bounds: ||G|| ∈ [min, max] 2. Dataset size is sufficient: |D| β‰₯ min_size 3. Training loss decreased: loss_after < loss_before

Implementation:

use bulletproofs::{BulletproofGens, PedersenGens, RangeProof};
use curve25519_dalek::scalar::Scalar;

pub struct GradientProof {
    pub norm_proof: RangeProof,
    pub size_proof: RangeProof,
    pub loss_proof: RangeProof,
}

impl GradientProof {
    /// Generate proof for gradient contribution
    pub fn generate(
        gradient: &Gradient,
        dataset_size: u64,
        loss_improvement: f64,
    ) -> Self {
        let bp_gens = BulletproofGens::new(64, 1);
        let pc_gens = PedersenGens::default();

        // Prove gradient norm is in range [0, max_norm]
        let norm = gradient.norm();
        let norm_proof = RangeProof::prove_single(
            &bp_gens,
            &pc_gens,
            &mut OsRng,
            norm as u64,
            &Scalar::random(&mut OsRng),
            64,  // 64-bit range
        ).unwrap();

        // Prove dataset size is at least min_size
        let size_proof = RangeProof::prove_single(
            &bp_gens,
            &pc_gens,
            &mut OsRng,
            dataset_size,
            &Scalar::random(&mut OsRng),
            64,
        ).unwrap();

        // Prove loss improvement
        let loss_proof = RangeProof::prove_single(
            &bp_gens,
            &pc_gens,
            &mut OsRng,
            (loss_improvement * 1000.0) as u64,  // Scale to integer
            &Scalar::random(&mut OsRng),
            64,
        ).unwrap();

        GradientProof {
            norm_proof,
            size_proof,
            loss_proof,
        }
    }

    /// Verify proof on Holochain validator
    pub fn verify(
        &self,
        max_norm: u64,
        min_dataset_size: u64,
        min_loss_improvement: u64,
    ) -> bool {
        let bp_gens = BulletproofGens::new(64, 1);
        let pc_gens = PedersenGens::default();

        // Verify all three proofs
        let norm_valid = self.norm_proof.verify_single(
            &bp_gens,
            &pc_gens,
            &commitment_norm,
            64,
        ).is_ok();

        let size_valid = self.size_proof.verify_single(
            &bp_gens,
            &pc_gens,
            &commitment_size,
            64,
        ).is_ok();

        let loss_valid = self.loss_proof.verify_single(
            &bp_gens,
            &pc_gens,
            &commitment_loss,
            64,
        ).is_ok();

        norm_valid && size_valid && loss_valid
    }
}

Integration with Holochain:

use hdk::prelude::*;

#[hdk_entry_helper]
pub struct GradientSubmission {
    pub gradient_hash: String,
    pub proof: SerializedProof,
    pub metadata: GradientMetadata,
}

#[hdk_extern]
pub fn validate(op: Op) -> ExternResult<ValidateCallbackResult> {
    match op {
        Op::StoreEntry(StoreEntry { entry, .. }) => {
            match entry {
                Entry::App(bytes) => {
                    let submission: GradientSubmission = deserialize(bytes)?;

                    // Verify ZK proof WITHOUT seeing private data
                    let proof = GradientProof::deserialize(&submission.proof)?;

                    if proof.verify(MAX_NORM, MIN_SIZE, MIN_IMPROVEMENT) {
                        Ok(ValidateCallbackResult::Valid)
                    } else {
                        Ok(ValidateCallbackResult::Invalid(
                            "ZK proof verification failed".into()
                        ))
                    }
                }
                _ => Ok(ValidateCallbackResult::Valid),
            }
        }
        _ => Ok(ValidateCallbackResult::Valid),
    }
}

Performance: - Proof size: ~700 bytes (3 range proofs) - Proof time: ~50ms (on CPU) - Verification time: ~10ms (on CPU) - βœ… Production-ready: Can handle 100+ gradients/second


Phase 2: zk-SNARKs for Full Validation (Advanced)

What to Prove (more complex): 1. Gradient computed via correct training algorithm 2. Dataset passes statistical tests (distribution, no outliers) 3. Model improves on PUBLIC validation set (prevents data poisoning)

Circuit Design (Arkworks):

use ark_ff::Field;
use ark_r1cs_std::prelude::*;
use ark_relations::r1cs::{ConstraintSynthesizer, ConstraintSystemRef, SynthesisError};

/// ZK circuit proving gradient quality
pub struct GradientQualityCircuit<F: Field> {
    // Public inputs (visible to verifier)
    pub public_validation_set: Vec<F>,
    pub model_weights_hash: F,
    pub min_accuracy: F,

    // Private inputs (hidden from verifier)
    pub private_dataset: Vec<F>,
    pub computed_gradient: Vec<F>,
    pub training_loss: F,
}

impl<F: Field> ConstraintSynthesizer<F> for GradientQualityCircuit<F> {
    fn generate_constraints(
        self,
        cs: ConstraintSystemRef<F>,
    ) -> Result<(), SynthesisError> {
        // Allocate public inputs
        let validation_set = Vec::new_input(cs.clone(), || Ok(self.public_validation_set))?;
        let model_hash = FpVar::new_input(cs.clone(), || Ok(self.model_weights_hash))?;
        let min_acc = FpVar::new_input(cs.clone(), || Ok(self.min_accuracy))?;

        // Allocate private inputs (witnesses)
        let dataset = Vec::new_witness(cs.clone(), || Ok(self.private_dataset))?;
        let gradient = Vec::new_witness(cs.clone(), || Ok(self.computed_gradient))?;
        let loss = FpVar::new_witness(cs.clone(), || Ok(self.training_loss))?;

        // Constraint 1: Dataset is valid size
        enforce_dataset_size(&dataset, 100)?;

        // Constraint 2: Gradient was computed correctly
        let computed_grad = compute_gradient(&dataset, &model_hash)?;
        computed_grad.enforce_equal(&gradient)?;

        // Constraint 3: Model improves on validation set
        let accuracy = evaluate_on_validation(&gradient, &validation_set)?;
        accuracy.enforce_cmp(&min_acc, std::cmp::Ordering::Greater, true)?;

        Ok(())
    }
}

// Helper: Enforce dataset has minimum size
fn enforce_dataset_size<F: Field>(
    dataset: &[FpVar<F>],
    min_size: usize,
) -> Result<(), SynthesisError> {
    if dataset.len() < min_size {
        return Err(SynthesisError::Unsatisfiable);
    }
    Ok(())
}

// Helper: Compute gradient in-circuit (simplified)
fn compute_gradient<F: Field>(
    dataset: &[FpVar<F>],
    model_hash: &FpVar<F>,
) -> Result<Vec<FpVar<F>>, SynthesisError> {
    // Simplified: In reality, this would implement full forward/backward pass
    // For MVP, we can use a hash-based commitment
    let mut gradient = Vec::new();
    for data_point in dataset {
        let grad_component = data_point * model_hash;
        gradient.push(grad_component);
    }
    Ok(gradient)
}

// Helper: Evaluate model on validation set
fn evaluate_on_validation<F: Field>(
    gradient: &[FpVar<F>],
    validation_set: &[FpVar<F>],
) -> Result<FpVar<F>, SynthesisError> {
    // Compute accuracy on validation set
    let mut correct = FpVar::zero();
    for (grad, val) in gradient.iter().zip(validation_set) {
        let prediction = grad * val;
        // Simplified: Real implementation would use loss function
        correct += prediction;
    }
    Ok(correct)
}

Proof Generation (FL Node):

use ark_groth16::{Groth16, ProvingKey, Proof};
use ark_bn254::{Bn254, Fr};
use ark_std::rand::Rng;

pub fn generate_gradient_proof(
    gradient: &Gradient,
    dataset: &Dataset,
    validation_set: &ValidationSet,
) -> Result<Proof<Bn254>, ZKError> {
    // Load proving key (generated during trusted setup)
    let proving_key = load_proving_key()?;

    // Create circuit instance
    let circuit = GradientQualityCircuit {
        public_validation_set: validation_set.to_field_elements(),
        model_weights_hash: gradient.model_hash,
        min_accuracy: Fr::from(90), // 90% minimum accuracy

        private_dataset: dataset.to_field_elements(),
        computed_gradient: gradient.to_field_elements(),
        training_loss: Fr::from(gradient.loss * 1000.0),
    };

    // Generate proof (takes 10-60 seconds)
    let mut rng = rand::thread_rng();
    let proof = Groth16::<Bn254>::prove(&proving_key, circuit, &mut rng)?;

    Ok(proof)
}

Verification (Holochain Validator):

use ark_groth16::{Groth16, VerifyingKey, Proof};
use ark_bn254::Bn254;

pub fn verify_gradient_proof(
    proof: &Proof<Bn254>,
    public_inputs: &[Fr],
) -> Result<bool, ZKError> {
    // Load verifying key (public, no trusted setup needed)
    let verifying_key = load_verifying_key()?;

    // Verify proof (takes ~10ms)
    let valid = Groth16::<Bn254>::verify(&verifying_key, public_inputs, proof)?;

    Ok(valid)
}

Performance: - Proof size: ~200 bytes (zk-SNARK) - Proof time: 10-60 seconds (depending on circuit complexity) - Verification time: ~10ms (on CPU) - βœ… Production-ready: Can handle 10-50 proofs/minute per node


πŸ› οΈ Implementation Roadmap

Phase 1: Bulletproofs MVP (2-3 months)

Goal: Prove basic gradient properties without revealing data

Tasks: 1. Research & Design (2 weeks) - Finalize proof requirements - Choose Bulletproofs library - Design proof schema

  1. Python Integration (3 weeks)
  2. Wrap Rust bulletproofs library for Python
  3. Integrate with Zero-TrustML PyTorch code
  4. Test with real gradients

  5. Holochain Integration (3 weeks)

  6. Update credits zome with ZK validation
  7. Implement proof serialization
  8. Test end-to-end

  9. Testing & Optimization (2 weeks)

  10. Benchmark proof generation/verification
  11. Test with 100+ nodes
  12. Optimize for performance

Deliverables: - Working ZK-PoC with Bulletproofs - <100ms proof generation - <10ms verification - Integrated with Zero-TrustML

Phase 2: zk-SNARKs Advanced (4-6 months)

Goal: Prove full gradient computation correctness

Tasks: 1. Trusted Setup Ceremony (4 weeks) - Design ceremony protocol - Recruit 100+ participants - Generate proving/verifying keys

  1. Circuit Development (8 weeks)
  2. Design gradient quality circuit
  3. Implement in Arkworks
  4. Test with real PyTorch models

  5. Integration (6 weeks)

  6. Python wrapper for proof generation
  7. Holochain validation logic
  8. End-to-end testing

  9. Production Deployment (4 weeks)

  10. Security audit
  11. Performance optimization
  12. Documentation

Deliverables: - Full zk-SNARK implementation - <60s proof generation - <10ms verification - Production-ready for medical/finance


πŸ“Š Cost-Benefit Analysis

Option A: No ZK Proofs (Current)

Pros: - βœ… Simple implementation - βœ… Fast (no proof overhead)

Cons: - ❌ Can't use Holochain DHT (no validation possible) - ❌ Limited to centralized storage (PostgreSQL) - ❌ No immutable audit trail - ❌ Trust required in validators

Use Cases: Research, low-stakes FL


Option B: Bulletproofs (Phase 1)

Pros: - βœ… Fast proof generation (<100ms) - βœ… Small proofs (~700 bytes) - βœ… NO trusted setup - βœ… Production-ready in 2-3 months

Cons: - ❌ Limited verification (ranges only) - ❌ Can't prove full computation - ❌ Vulnerable to sophisticated attacks

Use Cases: General FL, warehouse robotics, content networks


Option C: zk-SNARKs (Phase 2)

Pros: - βœ… Full computation verification - βœ… Smallest proofs (~200 bytes) - βœ… Fast verification (<10ms) - βœ… Maximum security

Cons: - ❌ Slow proof generation (10-60s) - ❌ Requires trusted setup ceremony - ❌ 6-9 months development time

Use Cases: Medical, finance, autonomous vehicles, safety-critical


🎯 Recommendation: Phased Approach

Immediate (Phase 9): PostgreSQL

Decision: Deploy FL with PostgreSQL backend Rationale: Unblocks user deployment, proves economics Timeline: Deploy now

Phase 10: Bulletproofs MVP

Decision: Add Bulletproofs-based ZK-PoC Rationale: Enables Holochain DHT with basic security Timeline: 2-3 months after Phase 9

Phase 11+: zk-SNARKs

Decision: Upgrade to full zk-SNARKs for critical industries Rationale: Medical/finance need maximum security Timeline: 6-9 months after Phase 10


πŸ“š Resources

Libraries

Bulletproofs: - https://github.com/dalek-cryptography/bulletproofs - Tutorial: https://doc.dalek.rs/bulletproofs/

zk-SNARKs (Arkworks): - https://arkworks.rs/ - Tutorial: https://github.com/arkworks-rs/r1cs-tutorial

Holochain Integration: - HDK Documentation: https://docs.rs/hdk/latest/hdk/ - Validation Guide: https://developer.holochain.org/concepts/6_validation/

Academic Papers

  1. Bulletproofs: BΓΌnz et al. (2018) - https://eprint.iacr.org/2017/1066.pdf
  2. Groth16 SNARKs: Groth (2016) - https://eprint.iacr.org/2016/260.pdf
  3. ZK for FL: Bonawitz et al. (2019) - "Towards Federated Learning at Scale"

Example Implementations

  • ZK-Rollups: https://github.com/matter-labs/zksync
  • Zcash: https://github.com/zcash/zcash (privacy-preserving blockchain)
  • Worldcoin: Uses SNARKs for proof-of-personhood

πŸ” Security Considerations

Threat 1: Malicious Prover Fakes Proof

Attack: Node generates proof for fake data Mitigation: Public validation set (verifier checks accuracy on PUBLIC data)

Threat 2: Validator Collusion

Attack: Validators accept invalid proofs Mitigation: DHT-based validation (majority required)

Threat 3: Proof Replay

Attack: Reuse old proof for new gradient Mitigation: Include timestamp + nonce in proof

Threat 4: Side-Channel Leaks

Attack: Timing/memory usage reveals private data Mitigation: Constant-time operations, blinding factors


βœ… Success Criteria

Phase 1 (Bulletproofs MVP)

  • βœ… Proof generation <100ms
  • βœ… Verification <10ms
  • βœ… Proof size <1KB
  • βœ… 100+ nodes tested
  • βœ… Zero private data leaks (security audit)

Phase 2 (zk-SNARKs)

  • βœ… Proof generation <60s
  • βœ… Verification <10ms
  • βœ… Proof size <500 bytes
  • βœ… 1000+ nodes tested
  • βœ… Formal verification of circuit
  • βœ… Successful trusted setup (100+ participants)

Status: Ready for Phase 10 implementation after Phase 9 deployment Next Action: Deploy Phase 9 with PostgreSQL, schedule ZK-PoC for Phase 10


"Privacy and transparency are not opposites - they are complements. ZK proofs prove we're doing the right thing, without revealing how we did it."