vEnhance's avatar

Sep 05, 2016

🖉 Approximating E3-LIN is NP-Hard

This lecture, which I gave for my 18.434 seminar, focuses on the MAX-E3LIN problem. We prove that approximating it is NP-hard by a reduction from LABEL-COVER.

1. Introducing MAX-E3LIN

In the MAX-E3LIN problem, our input is a series of linear equations (mod2)\pmod 2 in nn binary variables, each with three terms. Equivalently, one can think of this as ±1\pm 1 variables and ternary products. The objective is to maximize the fraction of satisfied equations.

Example 1 (Example of MAX-E3LIN instance)

x1+x3+x41(mod2)x1+x2+x40(mod2)x1+x2+x51(mod2)x1+x3+x51(mod2) \begin{aligned} x_1 + x_3 + x_4 &\equiv 1 \pmod 2 \\ x_1 + x_2 + x_4 &\equiv 0 \pmod 2 \\ x_1 + x_2 + x_5 &\equiv 1 \pmod 2 \\ x_1 + x_3 + x_5 &\equiv 1 \pmod 2 \end{aligned}

x1x3x4=1x1x2x4=+1x1x2x5=1x1x3x5=1 \begin{aligned} x_1 x_3 x_4 &= -1 \\ x_1 x_2 x_4 &= +1 \\ x_1 x_2 x_5 &= -1 \\ x_1 x_3 x_5 &= -1 \end{aligned}

A diligent reader can check that we may obtain 34\frac34 but not 11.

Remark 2. We immediately notice that

  • If there’s a solution with value 11, we can find it easily with F2\mathbb F_2 linear algebra.
  • It is always possible to get at least 12\frac{1}{2} by selecting all-zero or all-one.

The theorem we will prove today is that these “obvious” observations are essentially the best ones possible! Our main result is that improving the above constants to 51% and 99%, say, is NP-hard.

Theorem 3 (Hardness of MAX-E3LIN)

The 12+ε\frac{1}{2}+\varepsilon vs. 1δ1-\delta decision problem for MAX-E3LIN is NP-hard.

This means it is NP-hard to decide whether an MAX-E3LIN instance has value 12+ε\le \frac{1}{2}+\varepsilon or 1δ\ge 1-\delta (given it is one or the other). A direct corollary of this is approximating MAX-SAT is also NP-hard.

Corollary 4. The 78+ε\frac78+\varepsilon vs. 1δ1-\delta decision problem for MAX-SAT is NP-hard.

Remark 5. The constant 78\frac78 is optimal in light of a random assignment. In fact, one can replace 11 with δ\delta, but we don’t do so here.

Proof: Given an equation a+b+c=1a+b+c=1 in MAX-E3LIN, we consider four formulas a¬b¬ca \lor \neg b \lor \neg c, ¬ab¬c\neg a \lor b \lor \neg c, a¬b¬ca \lor \neg b \lor \neg c, abca \lor b \lor c. Either three or four of them are satisfied, with four occurring exactly when a+b+c=0a+b+c=0. One does a similar construction for a+b+c=1a+b+c=1. \Box

The hardness of MAX-E3LIN is relevant to the PCP theorem: using MAX-E3LIN gadgets, Hastad was able to prove a very strong version of the PCP theorem, in which the verifier merely reads just three bits of a proof!

Theorem 6 (Hastad PCP)

Let ε,δ>0\varepsilon, \delta > 0. We have NPPCP12+ε,1δ(3,O(logn)).\mathbf{NP} \subseteq \mathbf{PCP}_{\frac{1}{2}+\varepsilon, 1-\delta}(3, O(\log n)). In other words, any LNPL \in \mathbf{NP} has a (non-adaptive) verifier with the following properties.

  • The verifier uses O(logn)O(\log n) random bits, and queries just three (!) bits.
  • The acceptance condition is either a+b+c=1a+b+c=1 or a+b+c=0a+b+c=0.
  • If xLx \in L, then there is a proof Π\Pi which is accepted with probability at least 1δ1-\delta.
  • If xLx \notin L, then every proof is accepted with probability at most 12+ε\frac{1}{2} + \varepsilon.

2. Label Cover

We will prove our main result by reducing from the LABEL-COVER. Recall LABEL-COVER is played as follows: we have a bipartite graph G=UVG = U \cup V, a set of keys KK for vertices of UU and a set of labels LL for VV. For every edge e={u,v}e = \{u,v\} there is a function πe ⁣:LK\pi_e \colon L \rightarrow K specifying a key k=πe()Kk = \pi_e(\ell) \in K for every label L\ell \in L. The goal is to label the graph GG while maximizing the number of edges ee with compatible key-label pairs.

Approximating LABEL-COVER is NP-hard:

Theorem 7 (Hardness of LABEL-COVER)

The η\eta vs. 11 decision problem for LABEL-COVER is NP-hard for every η>0\eta > 0, given K|K| and L|L| are sufficiently large in η\eta.

So for any η>0\eta > 0, it is NP-hard to decide whether one can satisfy all edges or fewer than η\eta of them.

3. Setup

We are going to make a reduction of the following shape:

Reduction diagram for LABEL-COVER to MAX-E3LIN.
Reduction diagram for LABEL-COVER to MAX-E3LIN.

In words this means that

  • “Completeness”: If the LABEL-COVER instance is completely satisfiable, then we get a solution of value 1δ\ge 1 - \delta in the resulting MAX-E3LIN.
  • “Soundness”: If the LABEL-COVER instance has value η\le \eta, then we get a solution of value 12+ε\le \frac{1}{2} + \varepsilon in the resulting MAX-E3LIN.

Thus given an oracle for MAX-E3LIN decision, we can obtain η\eta vs. 11 decision for LABEL-COVER, which we know is hard.

The setup for this is quite involved, using a huge number of variables. Just to agree on some conventions:

Definition 8 (“Long Code”)

A KK-indexed binary string x=(xk)kx = (x_k)_k is a ±1\pm 1 sequence indexed by KK. We can think of it as an element of {±1}K\{\pm 1\}^K. An LL-binary string y=(y)y = (y_\ell)_\ell is defined similarly.

Now we initialize U2K+V2L|U| \cdot 2^{|K|} + |V| \cdot 2^{|L|} variables:

  • At every vertex uUu \in U, we will create 2K2^{|K|} binary variables, one for every KK-indexed binary string. It is better to collect these variables into a function fu:{±1}K{±1}f_u : \{\pm1\}^K \rightarrow \{\pm1\}
  • Similarly, at every vertex vVv \in V, we will create 2L2^{|L|} binary variables, one for every LL-indexed binary string, and collect these into a function gv:{±1}L{±1}g_v : \{\pm1\}^L \rightarrow \{\pm1\}

    Picture:

    An edge in our setup.
    An edge in our setup.

Next we generate the equations. Here’s the motivation: we want to do this in such a way that given a satisfying labelling for LABEL-COVER, nearly all the MAX-E3LIN equations can be satisfied. One idea is as follows: for every edge ee, letting π=πe\pi = \pi_e,

  • Take a KK-indexed binary string x=(xk)kx = (x_k)_k at random. Take an LL-indexed binary string y=(y)y = (y_\ell)_\ell at random.
  • Define the LL-indexed binary z=(z)z = (z_\ell)_\ell string by z=(xπ()y)z = \left( x_{\pi(\ell)} y_\ell \right).
  • Write down the equation fu(x)gv(y)gv(z)=+1f_u(x) g_v(y) g_v(z) = +1 for the MAX-E3LIN instance.

Thus, assuming we had a valid coloring of the graph, we could let fuf_u and gvg_v be the dictator functions for the colorings. In that case, fu(x)=xπ()f_u(x) = x_{\pi(\ell)}, gv(y)=yg_v(y) = y_\ell, and gv(z)=xπ()yg_v(z) = x_{\pi(\ell)} y_\ell, so the product is always +1+1.

Unfortunately, this has two fatal flaws:

  1. This means a 11 instance of LABEL-COVER gives a 11 instance of MAX-E3LIN, but we need 1δ1-\delta to have a hope of working.
  2. Right now we could also just set all variables to be +1+1.

We fix this as follows, by using the following equations.

Definition 8 (Equations of reduction)

For every edge ee, with π=πe\pi = \pi_e, we alter the construction and say

  • Let x=(xk)kx = (x_k)_k be and y=(y)y = (y_\ell)_\ell be random as before.
  • Let n=(n)n = (n_\ell)_\ell be a random LL-indexed binary string, drawn from a δ\delta-biased distribution (1-1 with probability δ\delta). And now define z=(z)z = (z_\ell)_\ell by

z=xπ()yn.z_\ell = x_{\pi(\ell)} y_\ell n_\ell . The nn_\ell represents “noise” bits, which resolve the first problem by corrupting a bit of zz with probability δ\delta.

  • Write down one of the following two equations with 12\frac{1}{2} probability each:

fu(x)gv(y)gv(z)=+1fu(x)gv(y)gv(z)=1. \begin{aligned} f_u(x) g_v(y) g_v(z) &= +1 \\ f_u(x) g_v(y) g_v(-z) &= -1. \end{aligned}

This resolves the second issue.

This gives a set of O(E)O(|E|) equations.

I claim this reduction works. So we need to prove the “completeness” and “soundness” claims above.

4. Proof of Completeness

Given a labeling of GG with value 11, as described we simply let fuf_u and gvg_v be dictator functions corresponding to this valid labelling. Then as we’ve seen, we will pass 1δ1 - \delta of the equations.

5. A Fourier Computation

Before proving soundness, we will first need to explicitly compute the probability an equation above is satisfied. Remember we generated an equation for ee based on random strings xx, yy, λ\lambda.

For TLT \subseteq L, we define

πeodd(T)={kKπe1(k)T is odd}. \pi^{\text{odd}}_e(T) = \left\{ k \in K \mid \left\lvert \pi_e^{-1}(k) \cap T \right\rvert \text{ is odd} \right\}.

Thus TT maps subsets of LL to subsets of KK.

Remark 9. Note that πodd(T)T|\pi^{\text{odd}}(T)| \le |T| and that πodd(T)\pi^{\text{odd}}(T) \neq \varnothing if T|T| is odd.

Lemma 10 (Edge Probability)

The probability that an equation generated for e={u,v}e = \{u,v\} is true is

12+12TLT odd(12δ)Tg^v(T)2f^u(πeodd(T)). \frac{1}{2} + \frac{1}{2} \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \widehat f_u(\pi^{\text{odd}}_e(T)).

Proof: Omitted for now… \Box

6. Proof of Soundness

We will go in the reverse direction and show (constructively) that if there is MAX-E3LIN instance has a solution with value 12+2ε\ge\frac{1}{2}+2\varepsilon, then we can reconstruct a solution to LABEL-COVER with value η\ge \eta. (The use of 2ε2\varepsilon here will be clear in a moment). This process is called “decoding”.

The idea is as follows: if SS is a small set such that f^u(S)\widehat f_u(S) is large, then we can pick a key from SS at random for fuf_u; compare this with the dictator functions where f^u(S)=1\widehat f_u(S) = 1 and S=1|S| = 1. We want to do something similar with TT.

Here are the concrete details. Let Λ=log(1/ε)2δ\Lambda = \frac{\log(1/\varepsilon)}{2\delta} and η=ε3Λ2\eta = \frac{\varepsilon^3}{\Lambda^2} be constants (the actual values arise later).

Definition 11. We say that a nonempty set SKS \subseteq K of keys is heavy for uu if SΛandfu^(S)ε2.\left\lvert S \right\rvert \le \Lambda \qquad\text{and}\qquad \widehat{f_u}(S) \ge \varepsilon^2. Note that there are at most ε2\varepsilon^{-2} heavy sets by Parseval.

Definition 12. We say that a nonempty set TLT \subseteq L of labels is ee-excellent for vv if

TΛandS=πeodd(T) is heavy. \left\lvert T \right\rvert \le \Lambda \qquad\text{and}\qquad S = \pi^{\text{odd}}_e(T) \text{ is heavy.}

In particular SS \neq \varnothing so at least one compatible key-label pair is in S×TS \times T.

Notice that, unlike the case with SS, the criteria for “good” in TT actually depends on the edge ee in question! This makes it easier to keys than to select labels. In order to pick labels, we will have to choose from a g^v2\widehat g_v^2 distribution.

Lemma 13 (At least ε\varepsilon of TT are excellent)

For any edge e={u,v}e = \{u,v\}, at least ε\varepsilon of the possible TT according to the distribution g^v2\widehat g_v^2 are ee-excellent.

Proof: Applying an averaging argument to the inequality

TLT odd(12δ)Tg^v(T)2f^u(πodd(T))2ε \sum_{\substack{T \subseteq L \\ |T| \text{ odd}}} (1-2\delta)^{|T|} \widehat g_v(T)^2 \left\lvert \widehat f_u(\pi^{\text{odd}}(T)) \right\rvert \ge 2\varepsilon

shows there is at least ε\varepsilon chance that T|T| is odd and satisfies (12δ)Tf^u(S)ε(1-2\delta)^{|T|} \left\lvert \widehat f_u(S) \right\rvert \ge \varepsilon where S=πeodd(T)S = \pi^{\text{odd}}_e(T). In particular, (12δ)Tε    TΛ(1-2\delta)^{|T|} \ge \varepsilon \iff |T| \le \Lambda. Finally by Remark 9, we see SS is heavy. \Box

Now, use the following algorithm.

  • For every vertex uUu \in U, take the union of all heavy sets, say

    H=S heavyS.\mathcal H = \bigcup_{S \text{ heavy}} S. Pick a random key from H\mathcal H. Note that HΛε2|\mathcal H| \le \Lambda\varepsilon^{-2}, since there are at most ε2\varepsilon^{-2} heavy sets (by Parseval) and each has at most Λ\Lambda elements.

  • For every vertex vVv \in V, select a random set TT according to the distribution g^v(T)2\widehat g_v(T)^2, and select a random element from TT.

I claim that this works.

Fix an edge ee. There is at least an ε\varepsilon chance that TT is ee-excellent. If it is, then there is at least one compatible pair in H×T\mathcal H \times T. Hence we conclude probability of success is at least

ε1Λε21Λ=ε3Λ2=η. \varepsilon \cdot \frac{1}{\Lambda \varepsilon^{-2}} \cdot \frac{1}{\Lambda} = \frac{\varepsilon^3}{\Lambda^2} = \eta.

Addendum: it’s pointed out to me this isn’t right; the overall probability of the equation given by an edge ee is 12+ε\ge \frac{1}{2}+\varepsilon, but this doesn’t imply it for every edge. Thus one likely needs to do another averaging argument.