The Unreasonable Power of the Sum-CheckProtocol

Introduction

When designing an efficient interactive proof system, there is only one hammer you need to have in your toolbox: the sum-check protocol of Lund, Fortnow, Karloff, and Nisan. The power of this protocol seems to be a bit under-appreciated in the ZKProofs community. I speculate that there are two reasons for this. The first is that the protocol inherently leads to proofs of at least logarithmic length, which means that in applications where super short proofs are really important—say, because these proofs need to be stored for all eternity on a blockchain—other techniques may be preferable (e.g., protocols derived from the work of Gennaro, Gentry, Parno, and Raykova, which tend to have constant proof length).

Second, the sum-check protocol by itself is not zero-knowledge nor “succinct for NP statements”. This means that, for NP statements, the proof length achieved by the sum-check protocol is not sublinear in the size of the NP witness. Note that there is strong evidence that no interactive proof can be succinct for NP statements. This is in contrast to argument systems (which unlike interactive proofs, are only computationally sound).

However, in the last few years, researchers have figured out how to combine the sum-check protocol with cryptographic commitments to obtain arguments that are both zero-knowledge and succinct for NP statements. This has led to zk-SNARKs with state of the art performance (e.g., Hyrax, zk-vSQL, Libra, Virgo, Spartan). So if you are interested in zk-SNARKs and satisfied with proofs of logarithmic length, then I urge you to learn about the sum-check protocol and how to use it.

To this end, the goal of this blog post is to describe the problem solved by the sum-check protocol and why it is so useful. For readers interested in learning more, I have posted a more detailed (and unfortunately more technical) exposition here.

This entire post will be framed in the context of interactive proofs (IPs). This means that the goal is for a verifier $V$ to offload an expensive computation to an untrusted prover $P$, while achieving work-saving for the verifier. We want the verifier to run in time linear in the input size, while keeping the proof short (logarithmic size) and the prover efficient. In a future post, I will explain how to combine the ideas described here with cryptographic commitments to get state of the art zk-SNARKs.

The Sum-Check Protocol

Suppose we are given a $v$-variate polynomial $g$ defined over a finite field $\mathbf{F}$. Let us further assume that $g$ has degree at most 2 in each variable, as this will be the case in all of the applications in both this post and its more detailed version. The purpose of the sum-check protocol is to compute the sum:
$$ H:=\sum_{b_1 \in \{0,1\}} \sum_{b_2 \in \{0, 1\}} \dots \sum_{b_v \in \{0, 1\}} g(b_1, \dots, b_v) \text{ } (1)$$

Summing up the evaluations of a polynomial over all Boolean inputs may seem like a contrived task with limited practical utility. But to the contrary,
later sections of this post will show that many natural problems can be directly cast as an instance of Equation (1).

To keep this post as nontechnical as possible, I will say only a little about how the sum-check protocol actually works; see the more detailed post for a complete description of the protocol and why it is sound. The protocol consists of $v$ rounds, one for each variable of $g$. In each round $i$, the prover sends to the verifier a degree 2 univariate polynomial $g_i$ ($g_i$ can always be specified with just 3 field elements, by either sending its coefficients, or its evaluations at 3 designated inputs in $\mathbf{F}$). The verifier performs some simple randomized consistency checks on each message $g_i$; these checks involve evaluating $g_i$ at a handful of inputs and checking that these evaluations are consistent with previous messages sent by the prover. The verifier can process each message sent by the prover in $O(1)$ time, and at the very end of the protocol the verifier also needs to evaluate $g$ at single point. Throughout, we assume any addition or multiplication operation in $\mathbf{F}$ takes $O(1)$ time.

What does the verifier gain by using the sum-check protocol?

The verifier could clearly compute $H$ via Equation (1) on her own by evaluating $g$ at $2^v$ inputs (namely, all inputs in $\{0,1\}^v$), but we are thinking of $2^v$ as an unacceptably large runtime for the verifier. Using the sum-check protocol, the verifier’s runtime is

$$O(v + \text{[the cost to evaluate } g \text{ at a single input in } \mathbf{F}^v ])$$

This is much better than the $2^v$ evaluations of $g$ required to compute $H$ unassisted. It also turns out that the prover in the sum-check protocol can compute all of its prescribed messages by evaluating $g$ at $O(2^v)$ inputs in $\mathbf{F}^v$. This is only a constant factor more than what is required simply to compute $H$ without proving correctness. The soundness error of the sum-check protocol is $O(v/|\mathbf{F}|)$. As long as $g$ is defined over a field of size significantly greater than $v$, this error is very small.

Applications

At this point in the post, we have our hammer in hand: the sum-check protocol, which allows a verifier to offload the computation expressions of the form of Equation (1) to an untrusted prover. However, wielding this hammer to solve problems people care about can require a good deal of cleverness. The goal of the rest of this post is give a flavor of how this typically works.

The general challenge is the following: suppose the verifier has an input $x$ and asks the prover to compute some function $F$ of $x$. To apply the sum-check protocol to compute $F$, we need to be able to express $F(x)$ as an instance of Equation (1). This means that we need to identify some $v$-variate polynomial $g$ of degree 2 in each variable such that $F(x)$ can be inferred from the sum of $g$’s values over all inputs in $\{0, 1\}^{v}$. Moreover, the verifier needs to be able to evaluate $g(r)$ at any desired input $r \in \mathbf{F}^v$ in linear time.

I will explain one illustrative example of this paradigm, in which the input $x$ is the adjacency matrix of a graph, and $F(x)$ is the number of triangles in that graph. To accomplish this, I have no choice but to introduce one technical notion, called multilinear extensions, defined in the lemma below. To avoid unnecessary details, I do not prove the lemma, but it follows readily from Lagrange interpolation.

Multilinear Extension Lemma

Let $f \colon \{0, 1\}^n \to \mathbf{F}$. Then there is a unique multilinear polynomial $\tilde{f}$ over $\mathbf{F}$ such that $\tilde{f}(x) = f(x)$ for all $x \in \{0,1\}^n$. Here, a polynomial is said to be multilinear if it has degree at most 1 in each variable. $\tilde{f}$ is called the multilinear extension (MLE) of $f$. Given as input a list of all $2^n$ evaluations of $f$, and an arbitrary point $r \in \mathbf{F}^n$, there is an algorithm that can evaluate $\tilde{f}(r)$ in $O(2^n)$ time.

An Application: An IP for Counting Triangles

Let $G$ be an undirected graph on $n$ vertices with edge set $E$. Let $A \in \{0, 1\}^{n \times n}$ be the adjacency matrix of $G$, i.e., $A_{i,j}=1$ if and only if $(i, j) \in E$. In the counting triangles problem, the input is the adjacency matrix $A$, and the goal is to determine the number of vertex triples $(i, j, k)$ that are all connected to each other by edges.

At first blush, it is totally unclear how to express the number of triangles in $G$ as the sum of the evaluations of a degree-2 polynomial $g$ over all inputs in $\{0, 1\}^v$. After all, the counting triangles problem itself makes no reference to any low-degree polynomial $g$, so where will $g$ come from? This is where multilinear extensions come to the rescue.

For it to make sense to talk about multilinear extensions, we need to view the adjacency matrix $A$ not as a matrix, but rather as a function $f_A$ mapping $\{0, 1\}^{\log n} \times \{0, 1\}^{\log n}$ to $\{0, 1\}$. The natural way to do this is to define $f_A(x, y)$ so that it interprets $x$ and $y$ as the binary representations of some integers $i$ and $j$ between $1$ and $n$, and outputs $A_{i, j}$.

Then the number of triangles in $G$ is simply: $$ \Delta := \frac{1}{6} \sum_{x, y, z \in \{0, 1\}^{\log n}} f_A(x, y) \cdot f_A(y, z) \cdot f_A(x, z) \textit{ } (2)$$
To see that this equality is true, observe that the term for $x, y, z$ in the above sum is 1 if edges $(x, y)$, $(y, z)$, and $(x, z)$ all appear in $G$, and is 0 otherwise. The factor $1/6$ comes in because the sum over unordered node triples $(i, j, k)$ counts each triangle 6 times, once for each permutation of $i$, $j$, and $k$.

Let $\mathbf{F}$ be a finite field of size $p \geq n^3$, where $p$ is a prime, and let us view all entries of $A$ as elements of $\mathbf{F}$. Here, we are choosing $p$ large enough so that $6\Delta$ is guaranteed to be in $\{0, 1, \dots, p\}$. This ensures that, if we associate elements of $\mathbf{F}$ with integers in $\{0, 1, \dots, p\}$ in the natural way, then Equation (2) holds even when all additions and multiplications are done in $\mathbf{F}$ rather than over the integers. (Choosing a large field to work over has the added benefit of ensuring good soundness error, as the soundness error of the sum-check protocol decreases linearly with field size.)

At last we are ready to describe the polynomial $g$ to which we will apply the sum-check protocol to compute $6 \Delta$. Recalling that $\tilde{f}_A$ is the MLE of $f_A$ over $\mathbf{F}$, define the $(3 \log n)$-variate polynomial $g$ to be: $$g(X, Y, Z) = \tilde{f}_A(X, Y) \cdot \tilde{f}_A(Y, Z) \cdot \tilde{f}_A(X, Z)$$

It is easy to see that $$6 \Delta = \sum_{x, y, z \in \{0, 1\}^{\log n}} g(x, y, z),$$ so applying the sum-check protocol to $g$ yields an IP computing $6\Delta$. This IP requires $3 \log n$ rounds, with the prover sending 3 field elements in each round. The verifier’s runtime is dominated by the time required to evaluate $g$ at a single input $(r_1, r_2, r_3) \in \mathbf{F}^{3 \log n}$, for which it suffices to evaluate $\tilde{f}_A$ at the three inputs $(r_1, r_2)$, $(r_2, r_3)$, and $(r_1, r_3)$. This can be done in $O(n^2)$ time using the Multilinear Extension Lemma. This is much faster than the fastest known algorithm for counting triangles, which runs in matrix multiplication time (superlinear in the input size).

It turns out that the prover in this IP can compute all of its prescribed messages in $O(n^3)$ time. This is not obvious, and for brevity, I’ll omit the details of how to accomplish this. Note that this runtime for the prover matches that of the the naive algorithm for counting triangles that iterates over all triples of vertices in $G$ and checks if they are all connected to each other.

More Applications

Hopefully the above gives a sense of how problems that people care about can be expressed as instances of Equation (1) in non-obvious ways. The general paradigm works as follows. An input $x$ of length $n$ is viewed as a function $f_x$ mapping $\{0, 1\}^{\log n}$ to some field $\mathbf{F}$. And then the multilinear extension $\tilde{f}_x$ of $f_x$ is used in some way to construct a low-degree polynomial $g$ such that, as per Equation (1), the desired answer equals the sum of the evaluations of $g$ over the Boolean hypercube.

The full version of this post covers some additional examples of this paradigm. In the hopes of enticing you to check it out, here is a summary of the examples covered there.

First, it gives a more sophisticated IP for counting triangles in which the prover is much more efficient than the above. Specifically, the prover runs the best-known algorithm to solve the triangles problem, and then does a low-order amount of extra work to prove the answer is correct. I don’t know of any other IPs or argument systems that achieve this super-efficiency for the prover while keeping the proof length sublinear in the input size.

Second, it gives a similarly super-efficient IP for matrix-powering. Given any $n \times n$ matrix $A$ over field $\mathbf{F}$, this IP is capable of computing any desired entry of the powered matrix $A^k$. The number of rounds and communication cost of the IP are $O(\log(k) \cdot \log n)$, and the verifier’s runtime is $O(n^2 + \log(k) \cdot \log n)$.

Finally, it uses this matrix-powering protocol to re-prove the following important result of Goldwasser, Kalai, and Rothblum (GKR): all problems solvable in logarithmic space have an IP with a quasilinear-time verifier, polynomial time prover, and polylogarithmic proof length. The basic idea of the proof is that executing any Turing Machine $M$ that uses $s$ bits of space can be reduced to the problem of computing a single entry of $A^{2^s}$ for a certain matrix $A$ ($A$ is in fact the configuration graph of $M$). So one can just apply the matrix-powering IP to $A$ to determine the output of $M$. While $A$ is a huge matrix (it has at least $2^s$ rows and columns), configuration graphs have a ton of structure, and this enables the verifier to evaluate $\tilde{f}_A$ at a single input in $O(s \cdot n)$ time. If $s$ is logarithmic in the input size, then this means that the verifier in the IP runs in $O(n \log n)$ time.

The original paper of GKR proved the same result by constructing an arithmetic circuit for computing $A^{2^s}$ and then applying a sophisticated IP for arithmetic circuit evaluation to that circuit. The approach described above is simpler, in that it directly applies a simple IP for matrix-powering, rather than a more complicated IP for the general circuit-evaluation problem.