Inductive Proof Systems and Recursive SNARKs

This blog post describes a powerful technique for defining systems that allows for scalable, verifiable computation on partially private data. There are many applications, some more exciting than others depending on your tastes, but I hope something for everybody:

Private, verifiable elections (which we’ll discuss in this post)
Succinct blockchains
Private, scalable DEXes
Auditable machine learning
Private, verifiable social wealth funds
A web app that proves to its users that it’s deleting all information older than a year (or at least not using it)
Verifying authenticity of a photo even when it has been edited multiple times by multiple people (and accompanied by a log of who edited it)

The technology for accomplishing this is essentially what has been called proof carrying data, but we’ll be introducing some new notation and call the resulting concept and notation “inductive proof systems”.

We at Coda have implemented inductive proof systems in the pickles library, which combined with a universal SNARK allows arbitrary applications (a.k.a., “Turing complete computations”) to be deployed in a scalable way on top of scalable public blockchains like Coda.

Verifiable voting as an inductive proof system

To get a feel for this technology, let’s dive into the specific example of a private, verifiable election$^{[1]}$. It will be private in the sense that each vote will have no identifying information attached to it. It will be verifiable in the sense that at the end we’ll have a small proof of the result (on the order of a few kBs) that can be easily checked on almost any device, including a browser or a phone.

For simplicity let’s say we have an election with two choices: 0 and 1. We’ll have two types of participants in this system:

Voters. Voters will create a zero-knowledge proof that proves “I am eligible to vote, my vote is for choice $c$ and $s$ is my ballot-stub”. The public input for this ZKP will be the pair $(c, s)$. The “ballot-stub” will be a piece of data that is unique to a registered voter, cannot be linked to that voter’s identity, and not knowable by anyone else. Think of it as a kind of unique signature on the message “I voted” with the voter’s secret-key. We’ll use this to make sure no one votes more than once.
Aggregators. Aggregators will collect votes and then produce a zero-knowledge proof that proves “I know a set of votes resulting in vote totals $(n_0, n_1)$ and with the corresponding set of ballot stubs $S$”. The public input for this ZKP will be $((n_0, n_1), S)$. Actually, to keep things succinct, it will not be $S$ but a succinct representation thereof, which we can get into later.

Once all the votes have been aggregated, we will be left the final vote-count $(n_0, n_1)$ and stub set $S$ with a single tiny SNARK which proves the existence of valid votes corresponding to $S$ which add up to the claimed values. Thus, we’ll achieve our goal of a private and verifiable voting system.

Toward a specification

Now that we have an informal specification of what we want, we need a cryptographer to translate it into a mathematical description, and then a programmer to translate that into a fully formal description (i.e., a program).

To that end, let’s describe a mathematical notation for capturing this specification. This notation will have the advantage of being quite close to the fully formal description that a programmer would write and can execute using the pickles library, which we’ll discuss in more detail later on.

First, define a set $\mathsf{Vote}$ which has the following property

IF there exists$^{[2]}$ a voting secret key $\mathsf{sk}$ such that $\mathsf{pk} := \mathsf{privKeyToPubKey(sk)}$, $\mathsf{isEligible(pk)} = \mathsf{true}$, $s = \mathsf{ballotStub}(sk)$, and there exists a signature$^{[3]}$ $\mathsf{sig}$ such that $\mathsf{sigVerify}(\mathsf{sig}, \mathsf{pk}, c) = \mathsf{true}$, THEN $(c, s) \in \mathsf{Vote}$.

We are assuming that we have some algorithm $\mathsf{isEligible}$ that checks eligibility of a public key, a way of obtaining public keys from private voting keys, and a way of creating and verifying signatures using a voting private-public keypair.

This is a bit hard to read. We can more compactly represent this property with the notation

$$
\frac{
\mathsf{pk} = \mathsf{privKeyToPubKey(sk)} \qquad
\mathsf{isEligible(pk)} \qquad
\mathsf{sig} \colon \mathsf{Signature} \qquad
\mathsf{sigVerify}(\mathsf{sig}, \mathsf{pk}, c)
}{(c, \mathsf{ballotStub(sk)}) \in \mathsf{Vote}}
$$

The way one reads this notation is “If everything on the top is true, then the bottom is true.” You can also read it backwards, like “in order to prove $(c, s) \in \mathsf{Vote}$ it suffices to provide everything on the top.” This notation is known as a “natural deduction-style inference rule.” You can read more about this notation in the context of logic here or here.

Next, we’ll give similar rules that describe how we can prove aggregate vote totals. We’ll give a description of a set $\mathsf{VoteCount}$ whose elements will be the tuples $((n_0, n_1), S)$ such that there exists a set of votes with corresponding stubs $S$ and such that the total number of votes for $0$ is $n_0$ and for $1$ is $n_1$.

The first rule is
$$
\frac{
(c, s) \in \mathsf{Vote}
}{
( (\textsf{if } c = 0 \textsf{ then } (1, 0) \textsf{ else } (0, 1)), \{ s \} ) \in \mathsf{VoteCount}
}
$$
The second rule is$^{[4]}$
$$
\frac{
((n_0, n_1), S) \in \mathsf{VoteCount} \qquad
((m_0, m_1), T) \in \mathsf{VoteCount} \qquad
}{
( (n_0 + m_0, n_1 + m_1), S \cup T ) \in \mathsf{VoteCount}
}
$$

You can see that these two rules completely characterize $\mathsf{VoteCount}$ in that any member of $\mathsf{VoteCount}$ can be proven to be a member by repeated application of these rules.

For instance, let’s say we have votes $(0, s_1), (0, s_2), (1, s_3), (0, s_4) \in \mathsf{Vote}$ and we would like to prove that $((3, 1), \{ s_1, s_2, s_3, s_4 \}) \in \mathsf{VoteCount}$. This we can do as follows:

$$
\frac{
\frac{
\frac{
(0, s_1) \in \mathsf{Vote}
}{
((1, 0), \{ s_1 \}) \in \mathsf{VoteCount}
} \qquad
\frac{
(0, s_2) \in \mathsf{Vote}
}{
((1, 0), \{ s_2 \}) \in \mathsf{VoteCount}
}
}{
((1 + 1, 0 + 0), \{ s_1 \} \cup \{ s_2 \} \in \mathsf{VoteCount}
} \qquad
\frac{
\frac{
(1, s_3) \in \mathsf{Vote}
}{
((0, 1), \{ s_3 \}) \in \mathsf{VoteCount}
} \qquad
\frac{
(0, s_4) \in \mathsf{Vote}
}{
((1, 0), \{ s_4 \}) \in \mathsf{VoteCount}
}
}{
((0 + 1, 1 + 0), \{ s_3 \} \cup \{ s_4 \} \in \mathsf{VoteCount}
}
}{
((2 + 1, 0 + 1), \{ s_1, s_2 \} \cup \{s_3, s_4 \}) \in \mathsf{VoteCount}
}
$$

This setup as an inductive proof system

Taken together, the one rule describing the set $\mathsf{Vote}$ and the two rules describing the set $\mathsf{VoteCount}$ give a full description of the system we would like, or at least the part of it related to producing proofs.

What we want is to obtain a “prover” algorithm for each of our rules. That way, we get a prover which lets voters produce voting-proofs, and provers which lets aggregators merge together voting proofs into proofs that prove the correctness of a corresponding vote count. We term the collection of prover algorithms that fit together in the appropriate way an “inductive proof system”.

Then, each application of an inductive-rule will correspond to producing a SNARK and each assumption of the form $\mathsf{someStatement} \in \mathsf{SomeSet}$ will correspond to the recursive verification of a previous SNARK.

Schematically, this whole “proof tree” is represented in this figure, where each purple square represents a voter and the proof they produce, and the big blue square represents the aggregator(s) and the proofs that they produce.

Toward a generalization

We can give a mathematical definition which captures the general notion of which the above sets of inductive rules are specific examples.

We won’t give the full definition here, but the idea is to define an “inductive NP set” as an NP set which can be specified by a sequence of “inductive NP rules”; and an “inductive NP rule” as an efficient algorithm which checks a statement against some witness data and a sequence of predecessor statements whose membership in some NP sets is assumed.

We can then describe a “compiler” which takes an inductive NP set and compiles it into an inductive proof system with a verifier and one prover corresponding to each inductive rule.

A fully formal specification and implementation

A fully formal specification of the above rules would be a program that specifies the inductive NP rules precisely. The pickles library allows one to do just this. It provides a compiler that turns a sequence of inductive rules into a verifier and one prover for each rule. It is joint work with Vanishree Rao and Vitaly Zelov.

It’s currently being used in Coda, and at mainnet launch, we’ll be releasing a SNApps (“snarkified apps”) toolkit built on top of pickles for anyone to build their own private and scalable snarkified smart-contracts on Coda.

[1]: A fully private election would be one in which you only find out the winner. This election system will be only partially private in that you will learn intermediate and the final vote counts as well. In that sense it’s a bit like government elections (at least in the US) where you learn the vote totals for each district, state, etc.

[2]: Actually, by “there exists…”, we mean “there *really* exists…”. That is, “an efficient program can produce…”.

[3]: One could do this more efficiently by relying on the signature-of-knowledge interpretation of the SNARK that pickles uses, but this is fine for explanatory purposes.

[4]: It is difficult to implement a succinct accumulator that supports taking unions, and so really we would do something different than taking unions in the second rule, but please permit this white lie for the sake of a clearer explanation.