i don't trust you

The most important part about any given encryption scheme is that, given the encrypted message, no one but the intended reader can make head or tail out of it — that it generates nonsense for everyone else. 8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4 is the SHA-256 hash for "hi". Did you guess it?

But what if we don't want total nonsense? What if we want to share just part of a piece of information (e.g. whether our message contains a vowel) without revealing the rest?

That's the main draw of zero-knowledge computations and zero-knowledge proofs: gaining the ability to make conclusions about information you don't know/doing computations with encrypted data. Here's a nice example:

Alice and Bob are two potential lovers. They want to know whether they love each other, but they want to avoid the embarrassment of unrequited love. The naïve way to do this is for Alice and Bob to reveal whether they love the other — at great risk to their social standing! There is, however, a clever way of doing it so that no one suffers unnecessarily.

Both Alice and Bob have two cards: one heart (❤️) and one club (♣️). There is one heart facedown on the table. Alice will place her cards facedown to the left of the heart according to the following rules:

If Alice loves, she will place ♣️❤️. If she doesn't, she will place ❤️♣️.

Bob does the same, but mirrored. He places his cards to the right of the middle, placing ❤️♣️ if he loves and ♣️❤️ if he doesn't. Alice and Bob take turns (secretly) cutting the deck. They put the cards out in a circle, and voila!

Both love: ♣️❤️❤️❤️♣️
At least one doesn't love: ❤️♣️❤️❤️♣️

Every non-match case looks the same (up to rotation). Try it!

some formality...

Hopefully this toy example demonstrates why zero-knowledge computations are even realistic. Both parties can draw a conclusion about the data (whether or not they are a match), but both parties' privacy (and social status) is preserved if it doesn't work out.

Formally, we can compute the following function $f(a,b)$ without sharing $a$ and $b$ individually.

$f(\text{love}, \text{love}) = 1$

$f(\text{love}, \text{not-love}) = 0$

$f(\text{not-love}, \text{love}) = 0$

$f(\text{not-love}, \text{not-love}) = 0$

The output of $f$ is identical for all three non-match cases. Our clever procedure let us preserve that property during our computation.

This example provides us a couple other things to notice. Note that the relationship here is symmetric — both Alice and Bob do the same thing. Later, we'll expand this interaction into interactions between Provers and Verifiers, where Alice will try to prove to Bob that she knows secret X without revealing X to Bob. Also note that Alice and Bob have to go back and forth; we'll later explore how we can avoid extra interactions.

Finally, one might naturally wonder if we can extend this to more than two people. This situation actually provides some useful foundations upon which we'll build, so...

what do you mean i have trust issues?

The canonical example for secure multi-party computation is the following situation:

Suppose $n$ professors want to compute their average salary, but no professor wants to reveal their salary to the others. These professors are so guarding that they want their salary to be secret even if $n-2$ of the professors collude.

The professors have a fair bit of time on their hands, so they come up with an elaborate scheme.

Each professor chooses $n$ random numbers that add up to their salary.
They send each other professor one of their random numbers, keeping one of them secret.
Once a professor receives everyone else's numbers, they add them up along with the number they kept secret and reveal their sum.
Each professor can now add up all the resulting sums, divide by $n$ , and voila! Average salary computed.

Let's explore why this works. Let's say professor $i$ has salary $s_i$ , and generates the list $p_{i, 1}, p_{i, 2}, \dots, p_{i, n}$ . They keep $p_{i, i}$ secret.

Now professor $j$ has got all the numbers $p_{j, 1}, p_{j, 2}$ and so on. He computes

T_j = \sum_{k=1}^n p_{k, j} = p_{1, j} + p_{2, j} + \dots + p_{n, j},

and sends it out. Finally, we can compute the sum of all the $T_j$ 's:

\begin{align*} \sum T_j = &T_1 + T_2 + T_3 + \dots + T_n \\ = &p_{1, 1} + p_{2, 1} + p_{3, 1} + \dots+ p_{n, 1} \\ + &p_{1, 2} + p_{2, 2} + p_{3, 2} + \dots+ p_{n, 2} \\ \vdots \\ +&p_{1, n} + p_{2, n} + p_{3, n} + \dots + p_{n, n} \end{align*}

Each professor sums across the rows of this grid (the $T_j$ 's). But when they're all added together like this, we can sum by columns and get, by definition,

\sum T_j = s_1 + s_2 + \dots + s_n,

The sum of all the professor's salaries, from which we can compute the average.

Even if $n-2$ of the professors collude and share the messages they received, they can't extract the other two professors' salaries.

Now let's examine a generalized version of this protocol, meeting an important character we'll come across again later in the series, when I'll talk about zero-knowledge proofs.

$k$ out of $n$ secret sharing

This protocol is called Shamir secret sharing. It makes use of a mathematical object familiar to many high school students: the humble polynomial.

Instead of sharing a random list of numbers, we're going to collapse them into a single polynomial with random coefficients. Let's say we want to split secret $x$ (some number) into $n$ pieces (or "shares"), such that learning $x$ is equivalent to learning $k$ out of $n$ pieces. We'll define the polynomial

s(t) = x + a_1t + a_2t^2 + \dots+ a_{k-1}t^{k-1}

with all the $a$ 's chosen randomly. The shares will be pairs $(1, s(1))$ , $(2, s(2))$ and so on until $(n, s(n))$ . The secret $x$ is accessible only through $s(0)$ .

Why is this secure? There's a neat property of polynomials that basically says that $k+1$ points determine a unique degree- $k$ polynomial, but with $k$ points there are infinitely many degree- $k$ polynomials — and infinitely many possible values of $s(0)$ , so $x$ is safe. Our polynomial has degree $k-1$ , which means that one would need $k$ shares to find $x$ , exactly as we want it!

There are a couple details I've glossed over that we can make here. Right now, every point still provides a little extra information as to what our polynomial looks like; if you find $k-1$ shares, you're closer to finding $x$ than someone who has only 1 share. We want this to be stronger: with less than $k$ shares, you have no extra information. Doing all this modulo a prime $p > n$ is how we achieve that (also known as "finite field arithmetic"). I also haven't discussed the actual process of reconstructing (or "opening") this secret from $k$ shares. It's mostly just boring algebra though, so the Wikipedia page is more than enough.

proving that i know what i'm talking about

So we've gone over some intuition about how zero-knowledge and secure computation works, and gone over particular examples. What we're missing, then, is generality. Being able to compute love-me-love-me-not is great, but we want to be able to do more than that. This needs to be tackled more formally, and so the math gets very intimidating very quickly. In a future post I hope to explore the math behind zero-knowledge proofs and computation — proving that one knows a piece of information, and doing any arbitrary computation on secret inputs without revealing what they are.

Just for fun, here's a list of some other neat examples of where zero-knowledge proofs are useful:

You're paying me to do a really hard math test for you. You don't want to pay unless you're sure I have the answers, but I don't want to give you my answers unless you pay me.
I want to pay you back for lunch, but I don't want you to see how little money I have in my wallet. You want your money, but you want to make sure I actually have enough money to pay you back.
We're looking through Where's Waldo and I want to prove that I found Waldo without telling you where Waldo is.
I made a Where's Waldo book and I want to prove that Waldo is there without telling you where Waldo is.

See you next time!

references

The math, the salary-sharing and the love/love-me-not examples are due to A Course in Cryptography by Pass and Shelat (pdf).