Collaboration Introduction

Protocol Labs, the Ethereum Foundation, the Filecoin Foundation, Supranational, Microsoft Research, and the Electric Coin Company are pleased to announce a collaboration to improve the performance of SNARKs and to produce a practical SNARK-based VDF design and implementation. SNARKs and verifiable delay functions (or VDFs) have the potential to improve the security, privacy, and scalability of blockchain platforms. Privacy and scalability are two of the main concerns that prevent the broader use of blockchain in many end-user applications. By developing new SNARK systems with expanded capabilities and accelerating the cryptographic primitives at the core of them, we can leverage novel cryptography to help address these concerns. With these improved capabilities we hope to unlock new use cases, and reduce cost, thereby enabling broader adoption.

Background on VDFs

In distributed protocols sometimes computations are made deliberately difficult or time consuming as a way to ensure that some amount of time or energy is expended by participants in those systems. As an example, Bitcoin’s proof-of-work ensures that a large amount of computation is performed for each block to make rewriting the ledger history difficult. However, proof-of-work computations are distributed computations and the network periodically adjusts the difficulty of the work as computation is added or removed in the network. If it didn’t, the network would lose control over the rate at which blocks are produced.

For some applications however, we need computations that are just as time-consuming to perform even with access to large amounts of (distributed) computational power. By ensuring a computation can only be efficiently computed with serial operations (unlike proof-of-work) we can ensure an efficient computer or dedicated hardware will be no less capable of performing the computation than an army of machines or supercomputers would be. At the same time, we need the computation’s result to be efficient to check, even if the operation is deliberately difficult to actually perform in the first place. 

VDFs fulfill all of the above goals, they are functions that are efficiently verifiable while introducing a delay that cannot be sped up through parallel computation. One example of this is computing a 5th root operation in a finite field: checking that the result is correct is a matter of exponentiating by five, which involves just a couple squarings and a multiplication – far cheaper than computing the root. VDFs can be used to produce cryptographically secure randomness, prevent denial-of-service attacks, and enable cryptographically provable storage replication. VDFs are being evaluated by a number of the leading blockchain platforms including Ethereum, Filecoin, Tezos, and Zcash; and improving their performance is critical to bringing them to production.

Collaboration Goals

Currently many verifiable and confidential computing techniques like SNARKs and VDFs can not be brought into production due to their computationally expensive nature. This renders these proof systems impractical for many use cases. In this collaboration, we will perform research and development that will improve the performance and cost of these operations. In particular we will develop a VDF based on a variant of the Sloth algorithm, with proofs generated by the Halo 2 and Nova proving systems. Optimized implementations will be developed that leverage both CPU and GPU architectures, and initial research will be conducted on the feasibility and performance of an ASIC-based system, with the ultimate goal of developing a system that can support general-purpose recursive SNARKs and improves cost and performance by a factor of 10.

SNARK-based VDF Implementation

In the case of SNARK-based VDFs, there are two key functions that must be improved: VDF evaluation and proof generation. Each of these functions must be optimized and made efficient to improve the user experience, reduce cost, and improve the security of the VDF. For VDF evaluation, it is critical that the implementation be as fast as possible. VDF evaluation is a ‘single-threaded’ process that requires a series of sequential modular multiplication and modular squaring operations, and as such should be optimized for low latency. VDF proving on the other hand requires a large number of similar operations, but these operations can be done in parallel, and as such the implementation should target high throughput. The result of these differing optimization goals can lend itself to the selection of different software and hardware architectures. 

VDF Evaluation

When optimizing VDF evaluation for latency there are a number of factors that can improve the speed of the function. While VDFs are designed to require a number of serial operations, there are a number of algorithmic choices at the software and hardware levels that can improve their performance. Such examples include the use of addition chains to compute the fifth root, the use of alternative representations such as Montgomery representation to improve the performance of each modular operation, and the use of different hardware algorithms that reduce the circuit depth of the modular arithmetic core. In addition to these algorithmic choices, the underlying hardware platform will also impact the speed of the operation with factors like the word size of the architecture and the frequency of the processor impacting the overall time to perform each operation. During this research we intend to implement the VDF evaluation function on commodity CPUs in order to achieve a fast implementation using ‘off-the-shelf’ hardware. This implementation will leverage algorithmic techniques and handwritten assembly to improve the performance as much as possible. In addition to the CPU implementation we will also design a custom hardware circuit to perform the VDF evaluation and then perform a variety of analyses to better understand the potential performance improvements that custom silicon provides.

VDF Proof Generation

For VDF proof generation we will use proofs of knowledge to show that the VDF verification operation was performed correctly. In particular, we use Proof-Carrying Data (PCD) and Incrementally Verifiable Computation – realized using Halo 2 and Nova – to distribute the task of building the proof to different platforms to accelerate the process. In the end, a single relatively small proof is created and can be checked much more efficiently than running the verification algorithm yourself.

The provers will need to keep pace with the evaluator – a computer or dedicated ASIC hardware – by letting the evaluator produce interstitial results at regular intervals during evaluation. These results are portioned out among the provers, which produce proofs (in parallel) that contiguous portions of the verification algorithm have been correctly executed. These are shown as leaves in the diagram below.

[Figure 1: An example of a left-heavy binary tree proof composition strategy. By the time the prover receives the final VDF result, it would have already completed most of the recursive proofs.]

The leaves are then chained together via PCD, which folds contiguous proofs together as they are created. The number of provers and complexity of the proofs are adjusted so that the latency between the completion of the evaluator and the production of the final proof is as small as possible.

The choice of VDF operation also affects the relative performance of the prover and evaluator. The prover calculates witnesses in the fast direction, x → x5, giving it approximately a 94× advantage over the evaluator which is computing x → x1/5. We do not use the common x → x3 construction because the Pallas/Vesta curves used in Halo 2 and Nova have a 3-order multiplicative subgroup, meaning exponentiation by 3 is not a permutation.

As with VDF evaluation, VDF proving can also be optimized through the use of algorithms, low-level programming, and custom silicon. However, unlike evaluation, VDF and SNARK proving are highly parallel operations, lending themselves to many-core architectures like GPGPUs. During our research we intend to implement algorithms that accelerate the SNARK proof generation for VDFs by writing custom CUDA code. This CUDA code will target core arithmetic functions required for proof generation like multi-scalar multiplication and number-theoretic transformations. In addition to this GPU implementation we also intend to perform a study to determine how the cost and performance of SNARK proof generation could be improved through the use of custom silicon. With the appropriate design, we believe it is possible to develop a system with over an order of magnitude or more performance improvement over existing designs. We hope that these improvements in performance, cost, and access will enable new use cases that have previously been impractical.